We resent it when preliminary or ancillary work occupies so much of our attention that our real work, or what we consider to be our real work, recedes. No less a figure than Enrico Fermi, when confronting the explosion of particle discoveries in his day, is said to have complained, “If I could remember the names of these particles, I would have been a botanist.”

Something like Fermi’s frustration is felt by today’s omics scientists. Their passion is biology, or more specifically, uncovering new biology through the systematic exploration of the “omes.” However, this kind of exploration demands not just preliminary but ongoing work involving bioinformatics technology.

So, what are omics scientists to do if they wish to pursue their passion without necessarily becoming experts in bioinformatics? It’s a complicated question, given that the answer depends not just on an individual omics scientist’s enthusiasm for mastering bioinformatics technology, but also on factors such as budgetary constraints, research complexity, and data quantity.

Fortunately, omics scientists have options for balancing biology and bioinformatics. These options are sketched in this article, which presents expert commentary about different approaches to omics data analysis: open source software, commercial software, cloud-based platforms, and outsourcing solutions.

Unfortunately, there are no simple tradeoffs. For example, the cost-conscious may be attracted to in-house solutions based on open source software, which allow users to avoid licensing fees. Such solutions can be costly in unexpected ways. Considerable time and effort can be devoted to climbing the bioinformatics learning curve. And discoveries can be missed because of a reliance on software with outdated gene annotations. Finally, troubles with lack of scalability may arise.

Those who are averse to investing in their own bioinformatics expertise may choose to outsource their analytical work to a core laboratory or a contact research organization. These options offer bioinformatics expertise on an as-needed basis, which can be cost-effective and time-efficient, but they may still require a degree of collaboration between omics scientists and data analysts.

Omics scientists, despite their best intentions, and whichever data analysis options they choose, may encounter “pay me now or pay me later” scenarios. (Or perhaps that “or” should be an “and.”) So, omics scientists may have no choice but to gather what bioinformatics wisdom they can while they persist in their work. But they don’t have to love data analysis, any more than Fermi loved particle names, to make valuable discoveries.

A sense of community

In 2013, scientists at the Center for Genomic Regulation in Barcelona introduced Nextflow, an open source workflow orchestrator that is now being developed by Seqera, a Centre for Genomic Regulation spinoff. By leveraging and extending Nextflow, Seqera aims to help omics scientists simplify the design and deployment of data-intensive pipelines on any infrastructure.

Seqera’s CEO, Evan Floden, PhD, says that over the last decade, Nextflow has become the most widely used open source framework for bioinformatics pipelines. This accomplishment, he suggests, makes Sequera/Nextflow part of a broader movement, one in which “bioinformatics is key to unlocking the next gains of scientific innovation” and “every scientist is a bioinformatics scientist.”

In line with being part of a movement, Sequera/Nextflow fosters collaboration. “By enabling scientists to write, run, and share data pipelines, Nextflow makes it easy for all scientists to combine tools, scripts, and computational resources to analyze their large data sets and get results, faster than ever,” Floden says. “What makes Nextflow even more unique is the community around it. Over the years, we have built a 20,000-plus-strong community of scientists not only using our open source software but also building and curating an extensive, shareable library of community-validated pipelines for scientists to use across academia, research, and industry.”

Besides noting that Nextflow is used “all the way from undergraduate courses at universities to clinical trials at large pharma companies,” Floden points out that Nextflow can accommodate different levels of expertise. “With artificial intelligence at Seqera’s core, scientists—both with and without software engineering backgrounds—can now perform bioinformatics analysis and supercharge their science.”

Floden states that the majority Seqera’s clients across biotech and pharma leverage multiomics. “Seqera’s platform, combined with Nextflow, is transforming multiomics research by offering unmatched flexibility, scalability, and automation,” he says. “Researchers can deploy and scale their genomics workloads across multi-cloud and on-premise environments, while Nextflow automates complex pipelines, significantly reducing manual effort and minimizing errors. This makes it easier to integrate diverse data types, such as genomics, transcriptomics, and proteomics, creating comprehensive workflows essential for areas like cancer research and personalized immunotherapy.”

A strategic outlook

Rami Mehio, vice president and head of global software and informatics at Illumina, emphasizes that omics scientists are confronted by data analysis challenges of vast scale and great complexity. The omics industry, he says, is “increasingly interested in insights from larger populations, various types of data, and layering together insights across multiomics.”

The observation is in line with points that were made last August at Illumina’s Strategy Update. For example, the company discussed how it would “reinvent the genome,” “unlock deeper biology,” and “turn data into insights” through innovations in sample-to-insight workflow solutions, the integration of multiomics workflows with interpretation and visualization tools, and a software/AI platform that integrates and analyzes large data cohorts and deciphers variants of unknown significance.

Following up on these points, Mehio cites a few of Illumina’s recent initiatives: “The DRAGEN [Dynamic Read Analysis for GENomics] software pipelines have been updated to support some of the latest emerging needs of our customers, like single-cell assays, proteomics, and the 5-base genome. For example, with the Fluent acquisition, we are scaling up the existing scRNA DRAGEN pipeline to support extremely fast processing of up to 1M cell experiments.”

Illumina's DRAGEN™ v4.3
Illumina is bullish about its DRAGEN software. “Soon, you will be able to load your sample directly on the sequencer and get all the insights that short-and long-read techniques have historically provided,” said Jacob Thaysen, PhD, Illumina’s CEO, at a Strategy Update. “Epigenetics will no longer require a separate workflow, it will become the fifth base in the genome, powered by chemistry and our DRAGEN engine.”

Mehio highlights Illumina’s work on the 5-base genome technology, which can give customers both variant and epigenetic information from a single library prep. He relates that Illumina has updated read mapping to maximize recovery of the methylation signal and redesigned variant calling to simultaneously perform SNV detection and report methylation status.

“In the expanded genome,” Mehio continues, “we are able to map and resolve haplotypes in segmental duplications, resolve and phase haplotypes in difficult regions, and call structural variants. The analysis is made available through DRAGEN ultrafast on-premises servers and is brought to the cloud at scale through our data platform Illumina Connected Analytics.”

Finally, Mehio reviews recent moves by Illumina in the tertiary analysis domain: “Illumina acquired EMEDGENE in 2021 for the GDT market and has closely integrated it with the Illumina connected software infrastructure and the DRAGEN latest germline variant calling pipeline. In 2022, we launched our Illumina Connected Insights for oncology applications. In 2023, we acquired Partek to support the discovery market to a wide range of omics and modalities.” Partek Flow supports the analysis and visualization of bulk sequencing data as well as single-cell and spatial data.

A broad portfolio

QIAGEN Digital Insights (QDI), the bioinformatics business of QIAGEN, offers software for tasks such as the normalization of next-generation sequencing and omics data, quality control, read mapping, and gene expression analysis. According to QDI, users no longer need to wait for a bioinformatician or computational expert to help them analyze their omics data.

“At QDI, we see a growing interest in analyses that integrate siloed omics data,” says Andrew Olson, director of marketing, QDI. “We offer comprehensive software and omics data solutions that help scientists discover the hidden potential of their data.

“Our solutions enable users across diverse fields—from basic research and drug discovery to clinical interpretation—to uncover complex relationships and biological insights from multiple omics sources thanks to high-quality human curated data and comprehensive analysis workflows. Our user-friendly platforms are designed to simplify complex multiomics analysis, making data-driven discoveries accessible to all scientists, regardless of bioinformatics expertise.”

The company’s portfolio includes the QIAGEN CLC Genomics Workbench Premium and Ingenuity Pathway Analysis, which combine advanced analytics with a user-friendly interface and are designed to make omics research accessible to all scientists. The company also offers specialized databases to explore biological relationships. These databases include the Human Gene Mutation Database Professional, the Catalogue of Somatic Mutations in Cancer, Human Somatic Mutation Database, QIAGEN Biomedical Knowledge Bases, OmicSoft Lands omics datasets, and the Pharmacogenomics Insights database.

A secure environment

DNAnexus provides the Precision Health Data Cloud, a cloud-based data analysis and management platform for DNA sequencing data. The company indicates that it has more than 45,000 registered users across 48 countries, and that it manages more than 105 petabytes of complex clinical genomic, proteomic, and other multiomic datasets.

In today’s rapidly evolving precision health landscape, multiomics has become the cornerstone of diagnostics, drug discovery, and therapeutic development, says Matt Newman, senior vice president and general manager of pharma and diagnostics at DNAnexus. “The integration of genomics, transcriptomics, proteomics, and other omics data has moved from being a ‘nice to have’ to a ‘need to have’ for driving actionable insights across the R&D pipeline.

network data security
To address concerns about the security of sensitive data, DNAnexus provides a cloud-native trusted research environment (TRE). DNAnexus asserts that its TRE streamlines data integration, enables secure col-laboration, and delivers IT-ready, turnkey GxP compliance, all while ensuring interoperability with existing systems. [Vertigo3d/iStock/Getty Images Plus]

Newman asserts that DNAnexus is well positioned to participate in the multiomics revolution. “[We offer] a robust and scalable platform designed specifically to manage the complex, large-scale data associated with different disciplines,” he points out. “[Our] platform features a suite of out-of-the-box applications and workflows to transform data to insights efficiently and securely.”

To address concerns about the security of sensitive data, DNAnexus provides a trusted research environment (TRE). TREs, which are also known as secure research environments, secure data environments, or data clean rooms, initially gained popularity in Europe, and they are now being adopted around the world as a secure way to share data.

“In our TRE,” Newman relates, “researchers can securely manage, analyze, and share multiomics data, integrating it with other modalities like clinical trial data or real-world data. Whether scientists are handling genomic sequencing from preclinical models or large proteomic datasets from patient cohorts, DNAnexus provides an analytical environment for teams—whether internal groups or external innovators—to harmonize, process, and derive insights that would be impossible to glean from a single omics layer. Our cloud-native platform ensures that data from multiple omics domains can be analyzed in an integrated fashion, fostering a holistic approach to understanding disease mechanisms and improving therapeutic targeting.

In addition to managing data, DNAnexus supports AI/ML model development and deployment. According to Newman, the DNAnexus platform “provides access to major AI/ML frameworks, enabling data scientists and researchers to develop models that can detect patterns, predict outcomes, and accelerate the discovery of new targets and biomarkers.”

A team player

Bridge Informatics, a bioinformatics service provider (BSP) that specializes in providing custom professional services and R&D software products, has expertise in machine learning, bioinformatics, data infrastructure, data mining, and software and database engineering. The company indicates that it is dedicated to translating data into biological results for life science companies, which often face a data analytics gap due to limited in-house expertise, difficulty retaining talent, and steep bioinformatics learning curves.

To bridge this gap, Bridge Informatics forms collaborations with its clients. “These collaborations empower scientists—both bench scientists with little bioinformatics experience and bioinformatics experts—to extract meaningful insights from their data without needing to individually master every computational detail,” says Jessica Corrado, head of business development and commercial operations, Bridge Informatics. “This streamlines research processes and saves valuable time and resources.”

As a BSP, Bridge Informatics offers expertise across a broad spectrum of bioinformatics applications, including data mining, visualization, gene expression analysis, variant calling, and pathway analysis. “[We] often specialize in developing custom software applications and pipelines tailored to specific research needs,” Corrado points out. “For example, we can leverage cutting-edge tools, such as single-cell GPT (scGPT), a pioneering AI large language model for single-cell RNA sequencing, and we can address the growing demand for multiomics data visualization.”

According to Corrado, partnering with a BSP can ensure business continuity. “A BSP’s well-documented code and processes ensure smooth knowledge transfer, minimizing disruptions when team members depart,” she explains. “This approach provides consistent, reliable support, eliminating the risk of critical knowledge gaps that can occur when relying on a single in-house expert.

“While core laboratory facilities can be appealing for their comprehensive services, they may face longer timelines due to public funding constraints. In contrast, BSPs typically offer greater flexibility and responsiveness to meet the specific needs and timelines of their clients.”

Corrado maintains that collaborating with BSPs can help researchers in pharma and biotech companies “harness their genomics data, accelerate research timelines, and gain a competitive edge in genomic research.” Companies that partner with BSPs, she adds, can “overcome common bioinformatics challenges and focus on their core scientific objectives, pushing the boundaries of genomic discovery.”

The post Omics Data Analysis: A Range of Options appeared first on GEN – Genetic Engineering and Biotechnology News.

Source