
Open-source structure prediction for proteins, nucleic acids, and ligands
RosettaFold3 (RF3) is an open-source neural network for predicting the 3D structures of biomolecular assemblies. Unlike earlier protein-only methods, RF3 models complete biological systems containing proteins, nucleic acids, small molecules, metals, and covalent modifications in a single unified framework.
The model extends the original RoseTTAFold architecture to handle arbitrary molecular assemblies. Given sequences, chemical structures, and connectivity information, it predicts atomic coordinates along with confidence estimates that help identify reliable predictions.
RF3 narrows the performance gap between closed-source AlphaFold3 and existing open-source implementations, with particularly strong performance on chirality prediction for small molecules. For simpler protein-only predictions without ligands, ESMFold offers faster single-sequence predictions, while Boltz-2 and Chai-1 provide alternative multi-modal approaches with different strengths.
RF3 uses a three-track architecture where information flows bidirectionally between sequence, pairwise, and structural representations:
1D track (sequence): Encodes amino acid residues, nucleic acid bases, and individual atoms. The model includes 46 element types to represent atoms commonly found in the PDB, beyond the standard 20 amino acids and 8 nucleic acid bases.
2D track (pairwise): Captures relationships between components, including bond types (single, double, triple, aromatic) and inter-residue interactions. This track enables the model to reason about molecular connectivity.
3D track (structure): Represents stereochemistry and 3D coordinates. The model starts from a "disconnected gas" of residues and atoms, progressively assembling them into physically plausible structures through successive network blocks.
RF3 was trained on protein-small molecule, protein-metal, and covalently modified protein complexes from the Protein Data Bank. Two versions are available: one trained on structures released before September 2021 and another trained on structures deposited before January 2024.
The model uses a hybrid representation: residue-level frames for amino acids and nucleic acids, combined with atomic coordinates for small molecules and modifications. Each atom receives a local coordinate frame based on neighboring atoms, enabling the Frame Aligned Point Error (FAPE) loss to supervise both local and global structure.
The model incorporates a denoising diffusion process that iteratively refines coordinates. At each step, the network updates atomic positions and orientations, starting from Gaussian noise and progressively recovering the true structure. More diffusion steps generally improve quality but increase runtime.
Provide protein sequences in FASTA format or upload PDB/CIF structure files. You can add up to 10 protein chains for complex predictions. When predicting multi-chain assemblies, the model learns inter-chain interactions from MSA pairing.
Add small molecules as SMILES strings or SDF files. The model predicts ligand binding poses within protein binding sites. For best results with protein-ligand complexes, ensure the ligand is chemically reasonable and the protein contains an appropriate binding site.
Upload a multiple sequence alignment (MSA) in .a3m or Stockholm format to improve prediction accuracy. MSAs provide evolutionary information that helps the model identify conserved structural features and interaction sites.
For most predictions, RF3 can generate MSAs automatically. Providing a pre-computed MSA is useful when you have domain-specific alignments or want reproducible results with a specific evolutionary context.
Add common cofactors and metal ions using Chemical Component Dictionary (CCD) codes. This is simpler than providing full SMILES for well-characterized molecules.
Common CCD codes include:
MG (magnesium), ZN (zinc), CA (calcium), FE (iron), MN (manganese)ATP, NAD, FAD, HEM (heme), SAM (S-adenosylmethionine)NAG (N-acetylglucosamine), MAN (mannose), GAL (galactose)Enter multiple codes separated by commas: MG, ZN, ATP
Model version: Choose between Latest, Preprint, or Benchmark weights. We recommend Latest for most use cases as it incorporates improvements over the original publication.
Number of models: Generate multiple structure predictions (1-10). Each model samples from the diffusion process independently, providing ensemble diversity. More models help identify consensus predictions but increase computation time.
Recycles: Internal refinement iterations within the network trunk. Higher values (10-20) improve accuracy for difficult targets where the initial prediction may need correction.
Diffusion steps: Controls the denoising trajectory length. The default of 200 steps balances quality and speed. Reducing to 50 steps approximately doubles speed with minimal quality loss for well-defined targets.
Random seed: Fix for reproducibility across runs. Different seeds explore different regions of the structural ensemble.
Early stopping threshold: pLDDT threshold for terminating diffusion early. Values of 0.5-0.7 accelerate predictions when the model reaches confident solutions. Lower thresholds allow more refinement for challenging cases.
The predicted Local Distance Difference Test (pLDDT) estimates per-residue accuracy on a 0-100 scale. Scores are encoded in the B-factor column of output CIF files:
| pLDDT Range | Interpretation |
|---|---|
| > 90 | Very high confidence, likely accurate |
| 70-90 | Confident prediction |
| 50-70 | Low confidence, structure may be incorrect |
| < 50 | Very low confidence, likely disordered or wrong |
The Predicted Aligned Error (PAE) matrix estimates the positional error of each residue relative to every other residue. For protein-ligand complexes, the pae_inter metric summarizes interface quality:
pae_inter < 10: High-quality dock, ligand pose is likely correctpae_inter > 10: Lower confidence, verify the binding mode independentlyEach prediction generates multiple CIF structure files, one per diffusion sample. Files are named with the seed and sample number (e.g., model_seed-42_sample-0.cif). Confidence metrics including pLDDT, PAE, and ranking scores are provided for detailed analysis.
RF3 excels at predicting structures for:
For pure protein-protein docking without small molecules, consider ColabDock or LightDock. For ligand-focused binding affinity predictions, Boltz-2 provides quantitative IC50 estimates alongside structure.
RF3 has reduced accuracy compared to specialized methods in certain scenarios:
The model produces useful error estimates, so always check pLDDT and pae_inter before trusting predictions.