ProteinIQ
AlphaFold2 example image

AlphaFold2Beta

MSA-based protein structure prediction with AlphaFold2 via ColabFold

What is AlphaFold2?

AlphaFold2 is the protein structure prediction model that revolutionized structural biology when DeepMind released it in 2021. By analyzing evolutionary relationships encoded in multiple sequence alignments (MSA), AlphaFold2 predicts 3D protein structures with accuracy rivaling experimental methods.

Our implementation uses ColabFold, which replaces AlphaFold2's database search with MMseqs2 to generate MSAs up to 90x faster than the original pipeline. You get the same high-accuracy predictions without needing local sequence databases or specialized hardware.

AlphaFold2 supports both single-chain (monomer) and multi-chain (multimer) predictions. For complex structures involving ligands, DNA, or RNA, consider newer models like Boltz-2 or Chai-1 which extend beyond protein-only predictions.

How does AlphaFold2 work?

AlphaFold2 extracts coevolutionary information from MSAs—when two residue positions consistently change together across evolution, they're likely in spatial contact. The model transforms this evolutionary signal into accurate 3D coordinates.

Evoformer: learning from evolution

The core innovation is the Evoformer module, a neural network that processes both sequence alignments and pairwise residue relationships simultaneously. Through 48 repeated blocks, information flows between these two representations, building increasingly refined structural features.

The MSA representation captures which residues are evolutionarily constrained and how they co-vary. The pair representation accumulates spatial relationship predictions between every residue pair.

Structure module: coordinates from features

After Evoformer processing, the structure module converts learned features into 3D atomic coordinates. It predicts backbone frames (position and orientation for each residue) and refines side-chain positions.

The prediction process includes recycling—feeding outputs back through the network for iterative refinement. Each recycle step improves the structure, particularly for challenging regions where initial predictions may be uncertain.

Multimer predictions

AlphaFold2-Multimer extends the base model to protein complexes. It uses paired MSAs that capture inter-chain coevolution, helping the model predict how chains interact. The model automatically detects if your input contains multiple chains and switches to multimer mode.

Inputs & settings

Protein sequence input

Provide protein sequences via text input (raw sequence or FASTA format), file upload (.fasta, .fa, .txt, .a3m), or PDB ID fetch from RCSB.

For multimer predictions, add multiple chains as separate inputs. Chain IDs are assigned automatically (A, B, C...). You can predict up to 10 chains in a single job.

Prediction parameters

  • Number of recycles: Refinement iterations where outputs feed back through the network. Higher values improve accuracy for difficult targets but increase runtime. Start with 3; increase to 6–12 for challenging predictions.
  • Number of models: AlphaFold2 includes 5 trained model variants with slightly different architectures. Running all 5 provides the best ranking through diversity, but fewer models speed up predictions.
  • Random seeds: Different seeds introduce variability, which can help the model escape local minima for difficult targets. Multiple seeds increase structural diversity in the output ensemble.
  • Model type: Auto selects the appropriate variant based on your input—ptm for single chains, multimer v3 for complexes. Force a specific type only if you have a reason (e.g., benchmarking).

MSA options

MSA quality is the single most important factor for prediction accuracy. Deeper alignments with more homologous sequences generally produce better predictions.

  • MSA mode: Controls which sequence databases are searched. UniRef+Environmental searches the broadest set including metagenomic sequences—use this for best results. UniRef only is faster but may miss distant homologs. No MSA disables sequence search entirely, making AlphaFold2 behave like a single-sequence model (similar to ESMFold).
  • Use templates: Queries PDB for known structures of homologous proteins. Templates can improve accuracy when close homologs exist but may be ignored if the MSA signal is strong. Enable for proteins with well-characterized structural families.

Advanced options

  • AMBER relaxation: Runs energy minimization using OpenMM/AMBER force fields. This improves side-chain geometry and resolves minor clashes, producing more physically realistic structures. Significantly increases runtime—enable only when side-chain accuracy matters.
  • Structures to relax: How many top-ranked predictions to relax. Relaxing only the best 1–2 saves time while still refining your most promising results.
  • Pair mode (multimer): Controls MSA pairing strategy for complexes. Unpaired+Paired combines chain-specific and cross-chain coevolution signals. Paired only uses only sequences where both chains appear together. Unpaired only treats chains independently.
  • Random seed: Set a specific value for reproducibility. Leave empty for the default.

Understanding the results

AlphaFold2 generates multiple ranked predictions, each with confidence metrics that help you assess reliability.

pLDDT (per-residue confidence)

The predicted Local Distance Difference Test (pLDDT) estimates how accurate each residue's position is, stored in the B-factor column of output PDB files.

pLDDTConfidenceInterpretation
> 90Very highHigh confidence in backbone and side chains
70–90GoodBackbone likely correct; side chains may vary
50–70LowStructural uncertainty; interpret with caution
< 50Very lowOften indicates disorder or flexibility

Residues with pLDDT below 50 frequently represent intrinsically disordered regions (IDRs) that lack fixed structure rather than prediction failures.

pTM and ipTM (global confidence)

The predicted Template Modeling score (pTM) estimates overall structural accuracy, ranging from 0 to 1. Values above 0.7 typically indicate reliable predictions.

For multimers, the interface pTM (ipTM) specifically assesses inter-chain interface quality. The combined ranking score weights both metrics: ranking_score = 0.8 × ipTM + 0.2 × pTM. Combined scores above 0.62 suggest reliable complex predictions.

PAE (predicted aligned error)

The Predicted Aligned Error matrix shows expected position errors between all residue pairs. Low PAE values (blue in visualizations) indicate confident relative positioning; high values (red) suggest uncertainty.

PAE is particularly useful for:

  • Identifying well-structured domains (low intra-domain PAE)
  • Assessing interface confidence in multimers (low inter-chain PAE)
  • Detecting flexible interdomain linkers (high PAE between domains)

Model ranking

AlphaFold2 ranks predictions by confidence scores. We recommend examining the top 2–3 structures rather than only the #1 result—especially for flexible proteins or novel folds where ranking may be uncertain.

When to use AlphaFold2 vs. other tools

FeatureAlphaFold2ESMFoldBoltz-2Chai-1
SpeedModerate (MSA search)Very fast (~seconds)ModerateModerate
MSA requiredYes (recommended)NoOptionalOptional
AccuracyExcellent (~0.96 TM)Good (~0.95 TM)ExcellentExcellent
Multimer supportYesYesYesYes
Ligand predictionNoNoYesYes
DNA/RNA supportNoNoYesYes

Which tool should you choose?

Use AlphaFold2 when you need the established gold-standard for protein-only structure prediction, particularly for natural proteins with clear evolutionary homologs.

Use ESMFold when speed matters most or you're working with designed proteins that lack evolutionary relatives. ESMFold skips MSA generation entirely.

Use Boltz-2 or Chai-1 when you need to predict complexes involving small molecules, DNA, or RNA, or when you want binding affinity estimates alongside structure predictions.

Use OpenFold 3 or Protenix when you want open-source implementations of AlphaFold3-like capabilities for multi-molecule complexes.

Limitations

AlphaFold2 requires evolutionarily related sequences for optimal performance. Predictions for de novo designed proteins or sequences without detectable homologs will have lower confidence.

The model predicts static structures and may not capture conformational changes, allosteric states, or dynamics. Regions with pLDDT below 50 often represent genuine flexibility rather than prediction errors.

AlphaFold2 cannot predict ligand binding, post-translational modifications, or the effects of small molecules on structure. For these applications, use newer models like Boltz-2.