
MSA-based protein structure prediction with AlphaFold2 via ColabFold
AlphaFold2 is the protein structure prediction model that revolutionized structural biology when DeepMind released it in 2021. By analyzing evolutionary relationships encoded in multiple sequence alignments (MSA), AlphaFold2 predicts 3D protein structures with accuracy rivaling experimental methods.
Our implementation uses ColabFold, which replaces AlphaFold2's database search with MMseqs2 to generate MSAs up to 90x faster than the original pipeline. You get the same high-accuracy predictions without needing local sequence databases or specialized hardware.
AlphaFold2 supports both single-chain (monomer) and multi-chain (multimer) predictions. For complex structures involving ligands, DNA, or RNA, consider newer models like Boltz-2 or Chai-1 which extend beyond protein-only predictions.
AlphaFold2 extracts coevolutionary information from MSAs—when two residue positions consistently change together across evolution, they're likely in spatial contact. The model transforms this evolutionary signal into accurate 3D coordinates.
The core innovation is the Evoformer module, a neural network that processes both sequence alignments and pairwise residue relationships simultaneously. Through 48 repeated blocks, information flows between these two representations, building increasingly refined structural features.
The MSA representation captures which residues are evolutionarily constrained and how they co-vary. The pair representation accumulates spatial relationship predictions between every residue pair.
After Evoformer processing, the structure module converts learned features into 3D atomic coordinates. It predicts backbone frames (position and orientation for each residue) and refines side-chain positions.
The prediction process includes recycling—feeding outputs back through the network for iterative refinement. Each recycle step improves the structure, particularly for challenging regions where initial predictions may be uncertain.
AlphaFold2-Multimer extends the base model to protein complexes. It uses paired MSAs that capture inter-chain coevolution, helping the model predict how chains interact. The model automatically detects if your input contains multiple chains and switches to multimer mode.
Provide protein sequences via text input (raw sequence or FASTA format), file upload (.fasta, .fa, .txt, .a3m), or PDB ID fetch from RCSB.
For multimer predictions, add multiple chains as separate inputs. Chain IDs are assigned automatically (A, B, C...). You can predict up to 10 chains in a single job.
3; increase to 6–12 for challenging predictions.Auto selects the appropriate variant based on your input—ptm for single chains, multimer v3 for complexes. Force a specific type only if you have a reason (e.g., benchmarking).MSA quality is the single most important factor for prediction accuracy. Deeper alignments with more homologous sequences generally produce better predictions.
UniRef+Environmental searches the broadest set including metagenomic sequences—use this for best results. UniRef only is faster but may miss distant homologs. No MSA disables sequence search entirely, making AlphaFold2 behave like a single-sequence model (similar to ESMFold).Unpaired+Paired combines chain-specific and cross-chain coevolution signals. Paired only uses only sequences where both chains appear together. Unpaired only treats chains independently.AlphaFold2 generates multiple ranked predictions, each with confidence metrics that help you assess reliability.
The predicted Local Distance Difference Test (pLDDT) estimates how accurate each residue's position is, stored in the B-factor column of output PDB files.
| pLDDT | Confidence | Interpretation |
|---|---|---|
| > 90 | Very high | High confidence in backbone and side chains |
| 70–90 | Good | Backbone likely correct; side chains may vary |
| 50–70 | Low | Structural uncertainty; interpret with caution |
| < 50 | Very low | Often indicates disorder or flexibility |
Residues with pLDDT below 50 frequently represent intrinsically disordered regions (IDRs) that lack fixed structure rather than prediction failures.
The predicted Template Modeling score (pTM) estimates overall structural accuracy, ranging from 0 to 1. Values above 0.7 typically indicate reliable predictions.
For multimers, the interface pTM (ipTM) specifically assesses inter-chain interface quality. The combined ranking score weights both metrics: ranking_score = 0.8 × ipTM + 0.2 × pTM. Combined scores above 0.62 suggest reliable complex predictions.
The Predicted Aligned Error matrix shows expected position errors between all residue pairs. Low PAE values (blue in visualizations) indicate confident relative positioning; high values (red) suggest uncertainty.
PAE is particularly useful for:
AlphaFold2 ranks predictions by confidence scores. We recommend examining the top 2–3 structures rather than only the #1 result—especially for flexible proteins or novel folds where ranking may be uncertain.
Use AlphaFold2 when you need the established gold-standard for protein-only structure prediction, particularly for natural proteins with clear evolutionary homologs.
Use ESMFold when speed matters most or you're working with designed proteins that lack evolutionary relatives. ESMFold skips MSA generation entirely.
Use Boltz-2 or Chai-1 when you need to predict complexes involving small molecules, DNA, or RNA, or when you want binding affinity estimates alongside structure predictions.
Use OpenFold 3 or Protenix when you want open-source implementations of AlphaFold3-like capabilities for multi-molecule complexes.
AlphaFold2 requires evolutionarily related sequences for optimal performance. Predictions for de novo designed proteins or sequences without detectable homologs will have lower confidence.
The model predicts static structures and may not capture conformational changes, allosteric states, or dynamics. Regions with pLDDT below 50 often represent genuine flexibility rather than prediction errors.
AlphaFold2 cannot predict ligand binding, post-translational modifications, or the effects of small molecules on structure. For these applications, use newer models like Boltz-2.