Arc Institute, Stanford University, and NVIDIA Announce Evo 2: The World’s Largest Open AI Model for Genomic Data

Evo 2 has been trained on an astounding 9 trillion nucleotides, the core building blocks of DNA and RNA.

Arc Institute, Stanford University, and NVIDIA Announce Evo 2: The World’s Largest Open AI Model for Genomic Data
(Downloaded from Freepik)

On Wednesday, California-based nonprofit Arc Institute, in collaboration with Stanford University and NVIDIA, unveiled Evo 2—the largest publicly available AI model for genomic data. Designed to predict and generate the genetic code of DNA, RNA, and proteins across all domains of life, Evo 2 is poised to accelerate biological research.

Evo 2 has been trained on an astounding 9 trillion nucleotides, the core building blocks of DNA and RNA. The model’s creators emphasize the open accessibility of the project, making the model parameters, training code, inference code, and OpenGenome2 dataset publicly available. This transparency is intended to foster new discoveries in the exploration of biological complexity.

A Leap Forward in Genomic Exploration

“Deploying a model like Evo 2 is akin to sending a powerful telescope to the farthest reaches of the universe,” said Dave Burke, CTO of Arc Institute. “While we can anticipate immense opportunities, the discoveries that lie ahead remain unknown.”

The model’s applications are wide-ranging, with potential use cases in biomolecular research, including predicting protein structures, identifying novel molecules for healthcare and industrial applications, and studying how gene mutations impact biological functions.

A Generative Approach to Genomics

"Evo 2 marks a major advancement in generative genomics," stated Patrick Hsu, cofounder of Arc Institute and assistant professor of bioengineering at UC Berkeley. "By enhancing our understanding of life’s fundamental building blocks, we open doors to groundbreaking solutions in healthcare and environmental science."

Available through NVIDIA’s NIM microservice, Evo 2 allows users to generate customized biological sequences, and researchers can fine-tune the model using proprietary datasets via the open-source NVIDIA BioNeMo Framework.

Faster, More Accessible Biological Design

Brian Hie, assistant professor of chemical engineering at Stanford University and an investigator at Arc Institute, explained: "Designing new biology has traditionally been a complex, time-consuming process. Evo 2 simplifies biological design, allowing researchers to create complex biological systems more efficiently and rapidly."

Empowering Research with Advanced Computational Resources

"NVIDIA accelerated the Evo 2 project by giving scientists access to 2,000 NVIDIA H100 GPUs via NVIDIA DGX Cloud on AWS. DGX Cloud provides short-term access to large compute clusters, giving researchers the flexibility to innovate. The fully managed AI platform includes NVIDIA BioNeMo, which features optimized software in the form of NVIDIA NIM microservices and NVIDIA BioNeMo Blueprints'" the company said in a blog post.

Unlocking New Frontiers in Genetic Research

Evo 2 has the ability to process genetic sequences of up to 1 million tokens in length, enabling in-depth analysis of genomes. This expanded processing power will allow researchers to explore how genetic sequences influence cell function, gene expression, and disease mechanisms.

"As human genes contain thousands of nucleotides, the ability to analyze such complex systems requires AI models capable of processing large segments of genetic data at once," said Hsu.

Applications Across Healthcare, Agriculture, and More

In healthcare, Evo 2 can assist in drug discovery by helping researchers identify gene variants linked to specific diseases and design molecules that target them precisely. In one study, Evo 2 demonstrated 90% accuracy in predicting the impact of mutations in the BRCA1 gene, which is associated with breast cancer.

The model’s potential also extends to agriculture, where it could help improve food security by enhancing plant biology understanding. Evo 2 could aid in the development of climate-resilient crops or even biofuels, as well as proteins capable of breaking down plastic or oil.

About Arc Institute

Founded in 2021 with $650 million in funding, Arc Institute focuses on advancing long-term scientific research with dedicated lab space and multi-year funding. Researchers at Arc Institute work on groundbreaking projects in areas like cancer, immune dysfunction, and neurodegeneration.