Basecamp Research launches Trillion Gene Atlas to accelerate AI drug discovery
AUSTIN, Texas / SAN JOSE, California — 18 March 2026 — Basecamp Research has launched the Trillion Gene Atlas, a global initiative designed to significantly expand known genetic diversity and accelerate the use of artificial intelligence (AI) in drug discovery. Announced at SXSW and NVIDIA’s GTC conference, the project aims to collect genomic data from more than 100 million species worldwide and compress processes that traditionally take decades into a matter of years.
Developed in collaboration with Anthropic, Ultima Genomics and PacBio, and powered by NVIDIA’s computing infrastructure, the initiative reflects a broader shift in the life sciences: from incremental laboratory research towards large-scale, data-driven approaches to therapeutic design.
AI drug discovery faces a critical data bottleneck
Artificial intelligence has transformed sectors such as language processing and computer vision, largely due to the availability of vast and diverse datasets. In biology, however, data remains a limiting factor for AI-driven drug discovery.
Most current AI models in drug development rely heavily on overlapping public datasets. Basecamp Research states that a large proportion of sequence-based models are trained on databases containing fewer than 250 million genetic sequences. While widely used, these datasets represent only a small fraction of Earth’s biological diversity.
“Today’s biological AI models are trained on a narrow slice of life on Earth,” said Glen Gowers, co-founder and CEO of Basecamp Research. “The Trillion Gene Atlas expands the known genetic universe by orders of magnitude beyond what is in public databases.”
The company’s proprietary dataset, BaseData™, already contains more than 10 billion genes derived from newly catalogued species, according to Basecamp. This dataset underpins its EDEN models, released in January 2026, and serves as the foundation for the Atlas initiative.
From prediction to AI-designed therapeutics
The strategic ambition of the Trillion Gene Atlas is to move AI in biology beyond prediction and towards therapeutic design.
Traditional computational approaches have focused on analysing biological data or predicting molecular interactions. Basecamp’s EDEN models are intended to generate new therapeutic candidates directly from biological inputs.
According to the company, early laboratory validation suggests that EDEN can generate biologically active outputs, including antimicrobial peptides and gene-editing approaches. These results remain at a preclinical stage.
The company attributes this progress to what it describes as emerging “scaling laws” in biological AI—where model performance improves as datasets become larger and more diverse.
“Bigger models alone aren’t enough,” said Phil Lorenz, Chief Technology Officer at Basecamp Research. “EDEN showed that performance in biological AI follows much steeper scaling trajectories with higher quality and fully contextualised data.”
While similar scaling effects have been observed in other AI domains, the extent to which they translate into clinical outcomes in drug development remains to be demonstrated.
Biodiversity as a new foundation for drug discovery
Basecamp’s approach is based on the idea that evolution provides a vast, largely untapped source of biological solutions for AI systems.
Each species encodes genetic adaptations shaped over millions of years. By capturing this diversity at scale, AI models may identify patterns and mechanisms that are not visible in smaller datasets focused primarily on human biology.
This represents a shift from conventional drug discovery, which often concentrates on known targets and model organisms. Instead, Basecamp is positioning biodiversity as a foundational dataset for AI-powered therapeutic innovation.
To support this effort, the company has established a network of collaborators across 31 countries over the past six years. New partnerships in Chile, Argentina and Antarctica have been announced as part of the Atlas expansion.
Building a global genomics data pipeline
Collecting genomic data at this scale introduces both logistical and ethical considerations.
Basecamp has developed a distributed data collection model that combines portable DNA sequencing technologies with local scientific partnerships. This approach enables sampling in remote environments that are typically difficult to access.
The company states that its partnerships are structured around knowledge exchange, local capacity building and Access and Benefit-Sharing agreements aligned with emerging frameworks for Digital Sequence Information (DSI).
These considerations are increasingly important as global attention grows around the ownership and use of genetic resources, particularly in biodiversity-rich regions.
Scaling DNA sequencing and AI compute infrastructure
The Trillion Gene Atlas depends on advances in both DNA sequencing technology and AI computing infrastructure.
Basecamp is working with Ultima Genomics and PacBio to deliver high-throughput sequencing at scale. Ultima’s systems are designed to increase sequencing output while reducing cost, while PacBio provides long-read sequencing capable of preserving genomic context.
“Biology has been fundamentally data-starved compared to other fields,” said Gilad Almogy, CEO of Ultima Genomics. “We designed our systems to enable the massive datasets required for BioAI.”
On the computational side, NVIDIA’s infrastructure will be used to process large-scale genomic data. Basecamp expects that advances in accelerated computing will significantly reduce processing times compared to traditional approaches.
The company states that workflows which previously required decades of sequencing and analysis could be shortened substantially, although real-world timelines will depend on execution at scale.
Toward an end-to-end AI drug discovery workflow
The Trillion Gene Atlas is also part of a broader effort to integrate multiple stages of drug discovery into a unified AI-driven workflow.
Anthropic is collaborating with Basecamp to connect its Claude AI system with biological datasets and modelling tools. The aim is to support systems capable of interpreting complex biological data and assisting in therapeutic design.
This reflects a wider trend in AI development towards systems that combine reasoning with domain-specific data. However, such integrated workflows are still emerging and remain under active development.
How Basecamp fits into the wider AI drug discovery landscape
Basecamp’s initiative comes amid rapid expansion in the use of AI across pharmaceutical research.
Companies such as Recursion Pharmaceuticals are generating large proprietary datasets using automated experimental platforms. Others, including Insilico Medicine and Exscientia, focus on generative AI to design novel drug molecules. Meanwhile, advances in protein structure prediction, such as those enabled by DeepMind’s AlphaFold, have expanded the tools available for target discovery.
These approaches address different parts of the drug development pipeline. Basecamp’s strategy is distinct in its focus on expanding the underlying biological dataset—particularly through large-scale biodiversity sampling.
Opportunities and limitations of AI in drug development
The potential impact of the Trillion Gene Atlas lies in its ability to broaden the data foundation for AI in drug discovery.
In principle, more diverse datasets could support the discovery of new therapeutic targets and improve the design of drugs across a wider range of diseases.
However, significant challenges remain. Translating AI-generated drug candidates into approved therapies requires extensive clinical validation, which remains time-consuming and costly. Regulatory frameworks for AI-designed medicines are still evolving, and the biological complexity of human disease continues to limit predictability.
There are also questions around data ownership and access. As proprietary datasets become more central to innovation, tensions between open science and commercial advantage are likely to increase.
A shift toward data-driven medicine
The launch of the Trillion Gene Atlas reflects a broader shift in the life sciences towards data-driven medicine and AI-enabled research.
As AI becomes more embedded in drug discovery, the availability and diversity of biological data are emerging as key factors in determining progress.
Basecamp’s initiative suggests that the next phase of innovation may depend as much on data generation and infrastructure as on advances in algorithms.
If realised, this approach could reshape how biological knowledge is produced and applied—moving towards a model where large-scale datasets and AI systems play a central role in designing new therapies.
For now, the Trillion Gene Atlas represents an ambitious attempt to expand the boundaries of biological data—and, in doing so, to redefine the foundations of AI-driven drug discovery.
Liked this article? You can support our independent journalism via our page on Buy Me a Coffee. It helps keep MoveTheNeedle.news focused on depth, not clicks.
👉 https://buymeacoffee.com/movetheneedle.news