A hybrid AI system beats AlphaFold at predicting complex protein structures

A team of researchers has built an artificial intelligence system that beats current methods for predicting protein structures, including DeepMind's AlphaFold programs. The system, called D-I-TASSER, shows how proteins fold into their three-dimensional shapes—a fundamental puzzle in biology that affects drug discovery and disease research.

The work, published today in Nature Biotechnology, shows what happens when you combine deep learning with traditional physics-based approaches. Yang Zhang at the University of Michigan led the research team that created a hybrid pipeline coupling multisource deep learning features with iterative threading assembly simulations.

"The dominant success of deep learning techniques on protein structure prediction has challenged the necessity and usefulness of traditional force field-based folding simulations," the researchers write. Their work answers this question: yes, when properly integrated with AI, these classical approaches remain valuable.

How it performs

In benchmark tests, D-I-TASSER showed clear performance gains. On a dataset of 500 challenging single-domain proteins, D-I-TASSER achieved an average template modeling (TM) score of 0.870, compared to 0.829 for AlphaFold2. The gap widens for multi-domain proteins, which make up most proteins in complex organisms. For 230 multi-domain proteins, D-I-TASSER achieved full-chain TM scores averaging 0.720, nearly 13% higher than AlphaFold2's 0.638.

The improvements were most dramatic for the hardest cases. For 148 difficult domains where at least one method struggled, D-I-TASSER's average score of 0.707 far exceeded AlphaFold2's 0.598. The hybrid approach works best precisely where pure deep learning methods hit their limits.

How it works

D-I-TASSER integrates multiple methodologies. The system begins by using an enhanced version of DeepMSA2 to search through genomic and metagenomic databases, creating deep multiple sequence alignments that capture evolutionary relationships between proteins. It then employs several deep learning models, including components from AlphaFold2 itself, to predict spatial constraints like distances between amino acids and hydrogen bonding patterns.

But rather than directly outputting a structure like AlphaFold2, D-I-TASSER uses these predictions to guide Monte Carlo simulations that explore different protein conformations. This physics-based approach allows the system to better handle challenging cases and generate multiple plausible structures when proteins are flexible.

The system incorporates several components from the Zhang Lab's suite of tools, including LOMETS3 for template identification, DeepPotential for contact prediction, and builds upon the established I-TASSER pipeline that has been used in the field for over a decade.

Testing in competition

The real test came during CASP15 (Critical Assessment of protein Structure Prediction), the field's blind competition held in 2022. D-I-TASSER, competing as "UM-TBM," outperformed all 44 other server groups in both single-domain and multi-domain categories. Its cumulative z-scores were 2-fold and 16-fold higher than the public version of AlphaFold2 for domains and multi-domain targets, respectively.

The CASP experiments have long served as the proving ground for protein structure prediction methods, with previous competitions documenting the evolution from traditional approaches to the deep learning revolution sparked by AlphaFold's debut in CASP13 and dominance in CASP14.

Handling multi-domain proteins

D-I-TASSER tackles one of the field's persistent problems: multi-domain proteins. Two-thirds of prokaryotic proteins and four-fifths of eukaryotic proteins contain multiple domains, yet most current methods struggle to correctly predict how these domains orient relative to each other.

D-I-TASSER addresses this through a domain-splitting and reassembly module. The system first identifies domain boundaries using tools like ThreaDom and FUpred, then models each domain separately while maintaining information about how they should connect. This divide-and-conquer strategy proves especially powerful for large proteins that have historically challenged structure prediction methods.

What you can do with it

The team applied D-I-TASSER to model structures for 19,512 proteins from the human proteome. The system folded 73% of full-chain sequences and 81% of individual domains, providing structural information for proteins involved in human health and disease. These predictions complement the AlphaFold Protein Structure Database, which has made millions of predicted structures freely available to researchers worldwide.

The researchers have made D-I-TASSER freely available to the scientific community through their website. All benchmark datasets and a software package can be downloaded for academic use. The system can also annotate protein functions using the integrated COFACTOR tool, which predicts ligand binding sites, enzyme classifications, and gene ontology terms.

Current limitations

D-I-TASSER still struggles with "orphan proteins" that have very few evolutionary relatives, a limitation shared by all current methods relying on sequence comparisons. Accurately predicting structures for intrinsically disordered proteins, which lack stable shapes, remains challenging.

For those interested in the broader context of protein structure prediction, the Protein Data Bank serves as the central repository for experimentally determined structures, while resources like SCOP and CATH provide classification systems for understanding protein folds.

This work shows that physics-based approaches haven't become obsolete in the age of AI. By combining deep learning's pattern recognition capabilities with the principled exploration of physical simulations, D-I-TASSER points toward a future where hybrid methods unlock greater advances in structural biology.

The ability to accurately predict protein structures from sequence alone has long been considered biology's holy grail. With systems like D-I-TASSER pushing the boundaries, we're moving closer to a future where understanding protein function and designing new proteins for therapeutic purposes becomes routine rather than extraordinary.