Multiomics data have become an integral and essential component of gene therapy development. These data provide an understanding of the molecular mechanism of disease in a specific patient population and how a candidate therapeutic can treat said disease safely and effectively.
Yet, as the volume of multiomics datasets has expanded, the therapeutic successes haven’t. A major reason is the inability to analyze data and generate insights. This is generally a result of ineffective data management, as data and systems have grown, because data is all over the place and is in different formats. Oftentimes, it can be challenging to untangle the mess of multiomics data and “see the forest for the trees.”
In the following blog, I discuss the current problems with multiomics in gene therapy and we are building an artificial intelligence (AI) solution.
The Current Challenges with Multiomics Data Analysis
Beyond the general challenges with data, the number of multiomics data integration and analysis methods has increased significantly, with little standardization or cohesive agreement on the “right” or “most accurate” methods. In addition, the gene therapy field is relatively new, making it difficult for many teams to create an infrastructure and long-term strategy for their data and analytical methods.
More specifically, some of the current issues outlined by Tarazona et al., include:1
Distinct signal-noise ratios across ‘omics techniques
Multiomics analysis requires analysis across a wide variety of disparate ‘omics datasets. Complicating analysis further is the quickly growing nature of medicine and biology being inherently multimodal. When the techniques used to generate datasets have different detection powers, or lack the ability to incorporate different formats, data can be missed. For instance, proteomics techniques have low sensitivity to low-abundance proteins. This is not a problem for transcriptomics techniques, so looking for correlations between transcriptomics and proteomics datasets presents a challenge.
Limitations of computational models
Current multiomics models are limited to identifying biomarkers, and while helpful, modeling complex signaling and gene regulatory pathways is a future goal. Models need to be more interpretable rather than being used for only end-point biomarker discovery. There is still a need to develop computational methods that integrate the mathematical formulation of multilayered regulation with robust and interactive visualization solutions. While there’s still much work that can be done to improve current models, including taking more advantage of the power of cloud computing and artificial intelligence (AI) doesn’t solve every analytical problem, this is a ripe area for the application of generalist biomedical AI models, which can integrate broader sets of encodings for the growing amounts of ‘omics data, resulting in the advancement of scientific discovery.
Inaccessible data
Large-scale multiomics studies have been done, constructing online portals for users to access associated datasets and relevant metadata. But with smaller studies, which utilize publicly available repositories, linking different ‘omics sets is challenging, and sample-level, matched, multiomics dataset is near impossible. There needs to be a more robust application of FAIR principles for multiomics data and standardization of the methods used for annotation and storage2. Robust data cataloging, even starting small, improves data findability. Secure and privileged access to data is paramount, but implementation of access controls needs to keep data accessible for the right purposes, to approved entities, to aid in scaled data analysis. Labeling and annotating datasets to understand the commonalities enables dataset interoperability, especially across multimodalities; As observed from tech and science experts alike, AI can be particularly powerful in this realm given its ability to extract insights from multimodal datasets to predict commonalities3,4. Finally, bringing all of these data and insights together into well-described packages that include metadata, allows reuse across different settings.
Scalability and performance issues
Multiomics data is rapidly expanding, and new techniques, such as single-cell sequencing, are creating massive data files. Estimates show that 480 exabytes of data will be produced daily by 20255. This puts a major strain on data storage and a significant amount of computational power to analyze data. Cloud computing relieves some of this struggle, but there’s a need for more efficient software design and powerful processing methods as part of an integrated cloud-based service. Existing multiomics computational algorithms need to be rewritten for the era of cloud computing, and novel model techniques need to be developed that take advantage of the next generation of cloud computing that’s being deployed today and in the future, primarily with the application of AI in mind; with scientific rigor performed to validate results6.
Together, these challenges limit the potential and application of multiomics datasets. Ultimately, they make it challenging to translate fundamental discoveries into clinical breakthroughs. In gene therapy, solving these problems would reduce the data analysis time and cost. This benefit gets passed on to patients, lowering the cost of gene therapies, some of which are the most expensive the biopharmaceutical industry has ever produced. The intersection of the people (scientists, researchers, and engineers) developing these new techniques, the processes implementing the models, and the scalable platforms powering the processes will streamline our ability to deliver on the value of gene therapeutics for humanity.
Modernized Multiomics and AI for More Efficient Collaboration, Knowledge Transfer, and Discovery
To realize these benefits, those using multiomics for gene therapy development need an “off-the-shelf” and “customizable” AI solution.
Here’s how AI can help.
AI platforms will improve the overall data ecosystem and standardization
AI-powered algorithms can identify and rectify errors, inconsistencies, and missing values in multiomics datasets, enhancing data quality and ensuring standardized data across the ecosystem. It can also aid in integrating and understanding diverse data sources, formats, and structures, facilitating data exchange and interoperability, and furthering a united and standardized data ecosystem. These processes can be automated, streamlining development and deployment through continuous monitoring, validation, and governance of machine learning models, making it easier and more efficient to keep data organized and make information readily available. Furthermore, pre-defined rules, protocols, and regulations can be enforced in the data environment using AI, ensuring standardization.
AI streamlines the data environment for more intuitive analysis
AI creates a more streamlined and efficient data environment in several ways. AI can automate the process of data cleaning, transformation, and integration. By handling these time-consuming and computationally heavy tasks in an automated and continuous way, along with anomaly and outlier detection, AI frees up scientists to focus on the analysis itself. This is at the heart of the machine learning operations (MLOps) concept, which combines the ML system development and operations into a unified practice. Generative AI and Natural language processing (NLP) makes AI interaction easier for those without computational expertise. Real-time and contextually relevant insights allow gene therapy researchers to make more rapid and informed decisions about pre-clinical and clinical candidates.
AI connects research, clinical, and commercial worlds
AI can integrate and analyze vast amounts of data from diverse, multimodal sources, including pre-clinical studies, clinical trials, electronic health records, and commercial databases3. This comprehensive data analysis helps identify patterns, correlations, and insights at a scale well past the human mind's capacity, and would be challenging to discover using traditional methods alone. By bridging these gaps, AI can accelerate the clinical translation and commercialization of gene therapies in development.
Multiomics Technology and AI: The Future of Gene Therapy
The integration of multiomics technology with innovative AI and streamlined deployment techniques presents a remarkable opportunity for advancing the field of gene therapies8. These novel therapeutic approaches demand the swift extraction of critical genetic insights from experimental data and extensive patient datasets. This process involves unraveling nuanced variations and their significance, ultimately connecting them to phenotypic traits and disease predisposition.
The rapid development in innovative AI tools used in biological settings are only just beginning. Consider, for instance, the advent of DeepVariant in 2017—a deep learning tool designed to navigate the intricacies of sequencing data, enabling precise variant calls even in traditionally challenging genomic regions. And the follow up launch of Deep Consensus that took this concept a step further by incorporating multiple data sources to establish a robust and highly accurate consensus regarding genetic variants present in a DNA sample.
As we combine the power of AI methodologies with genomics research, we anticipate even more profound advancements. As AI continues to evolve and refine its capabilities, it promises to usher in a new era of personalized medicine, improved treatment efficacy, and a deeper understanding of various diseases.
AI Disclosure: Feature image was generated by an AI image tool MidJourney.
Want to get updates on emerging AI multiomics trends straight to your inbox?
Subscribe to our newsletterReferences
- Conesa, A et al. Undisclosed, unmet and neglected challenges in multi-omics studies. Nature Computational Science 2021; Volume 1, 395–402.
- FAIR Principles. Go-Fair.org website. Accessed September 10, 2023.
- Towards Generalist Biomedical AI. July 26, 2023. arXiv:2307.14334.
- Topol, E. As artificial intelligence goes multimodal, medical applications multiply. Science. Sept 15, 2023.
- IDC Global DataSphere and StorageSphere Forecasts. IDC website. Published May 2022. Accessed September 10, 2023.
- Validating Computational Models with Real World Biological Experiments. Form Bio Resource Center. Published Aug 2023. Accessed Sept 2023.
- Nipko, J. AI and Gene Therapy: The Next Frontier in Life Science Innovation. Form Bio Resource Center. Published July 2023. Accessed September 2023.
- Google Research blog. DeepVariant: Highly Accurate Genomes With Deep Neural Networks. Published Dec 4, 2017. Accesses Sept 18, 2023.
- DeepConsensus: Gap-Aware Sequence Transformers for Sequence Correction. bioRxiv August 31, 2021.
- Google The Keyword blog. A new genome sequencing tool powered with our technology. Published Oct 26, 2022. Accessed Sept 18, 2023.