ProteinIQ
OpenFold-3 example image

OpenFold-3Beta

Open-source AlphaFold3-based structure prediction for proteins, ligands, DNA, and RNA

What is OpenFold-3?

OpenFold-3 is an open-source reproduction of DeepMind's AlphaFold3, developed by the AlQuraishi Lab at Columbia University and the OpenFold Consortium. It predicts 3D structures of biomolecular complexes including proteins, DNA, RNA, and small molecule ligands.

A key advantage is its Apache 2.0 license. Unlike AlphaFold3, which restricts commercial use, OpenFold-3 is fully open for both academic and industry applications. The model was trained on over 300,000 experimentally determined structures plus 13 million synthetic structures curated by OpenFold.

OpenFold-3 achieves competitive accuracy with AlphaFold3, reaching parity on protein-nucleic acid benchmarks and monomeric RNA structures. For related tools, see Boltz-2 for binding affinity predictions, Chai-1 for general multi-modal prediction, ESMFold for fast single-sequence predictions, and Protenix for another AlphaFold3 reproduction.

How does OpenFold-3 work?

OpenFold-3 replicates the AlphaFold3 architecture, which represents a fundamental shift from AlphaFold2. Rather than predicting backbone frames and torsion angles, it operates directly on raw atom coordinates using a diffusion-based generative approach.

Diffusion-based structure generation

The model learns to reverse a diffusion process that gradually corrupts molecular structures with noise. During training, the network sees atomic coordinates at various noise levels, learning local stereochemistry from low-noise examples and large-scale structure from high-noise examples.

During inference, the model starts from random atomic coordinates and iteratively denoises them through multiple steps to produce a coherent 3D structure. This generative approach allows OpenFold-3 to capture conformational diversity and handle multi-component complexes naturally.

Pairformer architecture

OpenFold-3 uses a Pairformer module instead of AlphaFold2's Evoformer. The Pairformer operates only on pair and single representations; the MSA representation is not retained through the main network. This 48-block module simplifies the architecture while maintaining accuracy.

The pair representation captures relationships between all pairs of atoms or residues in the system. This is where inter-molecular interactions (such as protein-ligand contacts or protein-protein interfaces) are learned and refined.

Multiple Sequence Alignment (MSA)

MSA provides evolutionary information by aligning your target sequence against thousands of homologous sequences from other organisms. Coevolution patterns reveal which residues are spatially close in 3D: if two positions consistently mutate together across evolution, they're likely in contact.

OpenFold-3 uses a simplified MSA module with just four blocks, processing alignments through pair-weighted averaging. The ColabFold server handles MSA generation automatically, searching sequence databases to find homologs.

MSA substantially improves prediction accuracy for natural proteins but adds 2-5 minutes to runtime. For synthetic or designed proteins lacking natural homologs, you can disable MSA to speed up predictions.

Cross-distillation for hallucination control

Diffusion models can sometimes "hallucinate" compact structures for regions that should be disordered. OpenFold-3 addresses this through cross-distillation, where training data includes predictions from AlphaFold-Multimer. This teaches the model to represent unstructured regions as extended loops rather than inventing false structure.

Inputs & settings

Molecule inputs

OpenFold-3 accepts multiple molecule types that you can combine into a single prediction job. Chain IDs (A, B, C...) are assigned automatically and displayed in the UI.

  • Protein: Upload PDB/CIF files, paste FASTA sequences, or fetch directly from RCSB using a PDB ID. Up to 10 chains.
  • Ligand (SMILES): Enter SMILES strings, upload SDF/MOL/MOL2 files, or fetch from PubChem by compound ID. Up to 10 ligands.
  • DNA: Paste sequences in FASTA format or upload structure files. Use standard nucleotides (A, T, G, C). Up to 10 chains.
  • RNA: Paste sequences in FASTA format or upload structure files. Use standard nucleotides (A, U, G, C). Up to 10 chains.
  • Ion (CCD code): Enter standard ion codes from the PDB Chemical Component Dictionary. Common codes: ZN (zinc), MG (magnesium), CA (calcium), FE (iron).

Prediction parameters

  • Model seeds: Number of different random seeds to use (1-5). Each seed produces independent structure samples through the diffusion process, allowing you to assess prediction variability.
  • Diffusion samples: Number of structures to generate per model seed (1-10). More samples capture conformational diversity but increase runtime.
  • Random seed: Fixed seed for reproducible predictions. Leave empty for a random seed each run.
  • Output format: Structure file format. CIF (recommended for full metadata) or PDB (legacy compatibility).

Advanced options

  • Generate MSA: Enables automatic MSA generation via the ColabFold server. We recommend keeping this enabled for natural proteins since it significantly improves accuracy. Disable only for synthetic proteins lacking homologs or when you need faster screening.
  • Low memory mode: Reduces GPU memory usage for larger complexes at the cost of increased runtime. Enable this if you encounter memory errors with large proteins or multi-chain complexes.

Understanding the results

OpenFold-3 returns multiple ranked predictions, each with a structure file and associated confidence metrics. Higher confidence generally indicates more reliable predictions, though even low-confidence regions may be structurally meaningful.

pLDDT (predicted Local Distance Difference Test)

This per-residue confidence metric indicates how accurately each position is predicted. Values range from 0 to 100.

pLDDTConfidenceInterpretation
> 90Very highBackbone and sidechain positions reliable
70-90HighBackbone prediction accurate
50-70ModerateTreat with caution, consider alternatives
< 50LowLikely disordered or prediction uncertain

Regions with pLDDT below 50 often represent intrinsically disordered regions rather than prediction failures. These regions may only adopt defined structures when bound to interaction partners.

pTM (predicted TM-score)

This score estimates overall structural accuracy, correlating with the TM-score you would obtain by comparing the prediction to the true structure. Values range from 0 to 1, with higher values indicating better global accuracy.

For single-chain proteins, pTM above 0.7 indicates a confident prediction. For multi-chain complexes, also consider ipTM.

ipTM (interface pTM)

This metric specifically evaluates the quality of inter-chain interfaces in multi-component complexes. It correlates strongly with interface accuracy across different molecule types: protein-protein, protein-nucleic acid, and protein-ligand.

Higher ipTM values indicate more reliable interface predictions. When predicting complexes, prioritize structures with high ipTM scores.

Structure files

Each prediction produces a structure file (CIF or PDB) containing atomic coordinates for all modeled molecules. The B-factor column contains per-residue pLDDT scores, which most molecular visualization tools can display as a color gradient. View your results in PDB Viewer or download for use in other software.

Best practices

Getting good predictions

Enable MSA for natural proteins. The evolutionary information from MSA substantially improves prediction accuracy. Only disable it when working with designed proteins that lack natural homologs.

Use multiple seeds for important predictions. Running with 2-3 seeds helps assess prediction confidence. If structures from different seeds are similar, the prediction is likely reliable. Substantial differences indicate uncertainty.

Increase diffusion samples for flexible systems. Complexes with flexible binding modes or intrinsically disordered regions benefit from more diffusion samples to explore conformational space.

Check your SMILES stereochemistry. Ligand binding is stereospecific. Ensure your SMILES strings correctly represent the enantiomer you intend to model. An incorrect stereocenter will produce misleading results.

Choosing the right settings

Use caseSeedsDiffusion samplesMSANotes
Quick screening13-5OffFast, lower accuracy
Standard prediction15OnGood balance for most cases
High-confidence2-35-10OnUse for important targets
Conformational diversity3-510OnExplores structural space

Interpreting results wisely

Examine multiple predictions. Don't rely solely on the top-ranked structure. Comparing the top 2-3 predictions reveals which features are consistently predicted versus variable.

Check confidence metrics together. A structure with high pLDDT but low ipTM may have accurate individual chains but an unreliable interface. Consider all metrics when evaluating complex predictions.

Low confidence doesn't always mean wrong. Intrinsically disordered regions will show low pLDDT regardless of prediction quality. Cross-reference with known biology to interpret these regions.

When to use OpenFold-3 vs. other tools

FeatureOpenFold-3Boltz-2Chai-1ESMFold
ProteinsYesYesYesYes
LigandsYesYesYesNo
DNA/RNAYesYesYesNo
MSA supportYesYesYesNo
Affinity predictionNoYesNoNo
SpeedModerateModerateModerateVery fast
LicenseApache 2.0Apache 2.0AcademicMIT

Which tool should you choose?

  • Use OpenFold-3 when you need an open-source AlphaFold3 implementation with full commercial rights, or when working with protein-nucleic acid complexes where it achieves parity with AlphaFold3.
  • Use Boltz-2 when you need binding affinity predictions alongside structure prediction, or want the most comprehensive feature set for drug discovery workflows.
  • Use Chai-1 when you're predicting multi-component complexes and don't need affinity predictions. It offers similar capabilities with a different underlying architecture.
  • Use ESMFold when speed is critical and you only need protein structure prediction. It predicts in seconds without MSA, trading some accuracy for throughput.
  • Use Protenix as another AlphaFold3 reproduction if you want to compare predictions or need a ByteDance-developed alternative.

Limitations

OpenFold-3 is a preview release, and the model is under active development. While it achieves strong benchmark performance, some edge cases may not be as well-handled as the proprietary AlphaFold3.

The model works best for well-folded proteins with evolutionary signatures. Predictions for intrinsically disordered proteins, membrane proteins, or sequences with unusual amino acid compositions may be less reliable.

For protein-ligand affinity prediction, OpenFold-3 does not provide binding strength estimates. Use Boltz-2 if you need quantitative affinity predictions.


References