What is AlphaFlow?
AlphaFlow is a generative AI model for predicting protein conformational ensembles — diverse sets of 3D structures representing the different shapes a protein can adopt. Developed by Bowen Jing, Bonnie Berger, and Tommi Jaakkola at MIT, AlphaFlow repurposes high-accuracy structure predictors like AlphaFold and ESMFold by fine-tuning them with a flow matching objective to generate multiple conformations rather than a single static prediction.
Proteins are not rigid molecules. They flex, breathe, and transition between conformational states to perform biological functions such as enzyme catalysis, signal transduction, and molecular recognition. Traditional structure prediction tools like AlphaFold2 produce a single structure, which may not capture this inherent flexibility. AlphaFlow addresses this limitation by generating conformational ensembles that reveal protein dynamics, flexible regions, and alternative binding states — information critical for understanding protein function and designing therapeutics.
How to use AlphaFlow online
ProteinIQ provides a web-based interface for running AlphaFlow without command-line installation or MSA generation. Upload a protein sequence, select a model variant, and receive multiple 3D structures representing the conformational ensemble.
Inputs
| Input | Description |
|---|---|
Protein sequence | The amino acid sequence to model. Upload a FASTA file, enter a raw sequence, or fetch from RCSB using a PDB ID (e.g., 1UBQ). Maximum sequence length depends on available compute resources. |
Job name | Optional identifier to organize results when running multiple ensemble generations. |
Settings
Model settings
| Setting | Description |
|---|---|
Model variant | The AlphaFlow model to use. ESMFlow MD (default, recommended) generates molecular dynamics-like ensembles from single sequences. ESMFlow PDB models experimental ensemble diversity. AlphaFlow MD and AlphaFlow PDB use MSA-based predictions (auto-generated) for potentially higher accuracy at increased runtime. |
Number of samples | Number of conformations to generate (1–50, default 10). More samples provide better ensemble coverage but increase runtime linearly. |
Advanced settings
| Setting | Description |
|---|---|
Inference steps | Number of denoising steps during generation (5–50, default 10). More steps improve sample quality at the cost of longer runtime. |
Diversity (tmax) | Controls the diversity of generated structures. 1.0 (default) produces maximum diversity. 0.75 provides balanced output. 0.5 generates more conservative, structurally similar conformations. |
Results
The output includes multiple PDB structures displayed in an interactive 3D viewer. Each structure represents a distinct conformation from the predicted ensemble.
Interpreting ensembles
- Flexible regions — Areas where structures differ significantly indicate intrinsically flexible parts of the protein
- Core stability — Regions that remain consistent across samples represent the stable structural core
- Conformational states — Distinct clusters of similar structures may represent different functional states
- RMSD analysis — Root-mean-square deviation between ensemble members quantifies structural diversity
How does AlphaFlow work?
AlphaFlow transforms deterministic structure predictors into generative models using flow matching, a technique for learning continuous transformations between probability distributions.
Flow matching framework
The method learns to transform random noise into protein structures through a continuous denoising process. During training, structures are interpolated as:
where is sampled from a harmonic prior (random noise) and is the ground-truth structure. The parameter represents progress along this interpolation — is pure noise and is the clean structure. The model learns to predict the denoised structure at each step using the Frame Aligned Point Error (FAPE) loss, which is appropriate for SE(3)-invariant protein representations.
At inference time, the model iteratively refines a noisy starting structure through multiple denoising steps, generating a new sample with each run.
Model variants
AlphaFlow provides four model variants combining two base architectures with two training datasets:
Base architectures
| Architecture | Description |
|---|---|
| AlphaFlow | Built on AlphaFold, uses multiple sequence alignments (MSAs) for evolutionary context. Higher accuracy but requires MSA generation (~minutes). |
| ESMFlow | Built on ESMFold, uses single sequences only. Faster inference with no MSA required, suitable for proteins lacking homologs. |
Training datasets
| Dataset | Description |
|---|---|
| PDB | Trained on 1.28M structures from the Protein Data Bank. Models experimental conformational diversity from X-ray crystallography and cryo-EM — different crystal forms, ligand-bound states, and pH conditions. |
| MD | Trained on molecular dynamics trajectories from the ATLAS dataset at 300K. Models thermal fluctuations and dynamic motions at physiological temperature. Captures flexibility, transient contacts, and solvent exposure patterns. |
The MD-trained models are particularly useful for studying protein dynamics, while PDB-trained models better represent experimentally observed conformational diversity.
Input embedding modifications
AlphaFlow adds a custom input module before the folding trunk that:
- Accepts noisy beta-carbon coordinates as supplementary input
- Computes inter-residue distance histograms (39 bins from 3.25–50.75 Å)
- Encodes the time step using Gaussian Fourier features
- Processes through triangle attention operations
This allows the deterministic predictor to function as a denoising model with minimal architectural changes.
Limitations
- Timescale constraints — Cannot capture slow conformational changes beyond the timescales present in training MD simulations (microseconds)
- Single-chain focus — Primarily designed for monomeric proteins; multimer ensembles are not directly supported
- Sequence length — Computational cost scales with sequence length; very long proteins (>1000 residues) may require significant runtime
- Backbone-only prediction — The model predicts Cβ distributions; all-atom coordinates are filled in via a single forward pass without iterative refinement
- MSA dependency — AlphaFlow variants require MSA generation, which adds preprocessing time compared to ESMFlow
Related tools
- AlphaFold2 — Single-structure prediction with MSA. Use when a single high-confidence structure is sufficient.
- ESMFold — Fast single-sequence structure prediction. ESMFlow's base architecture.
- OpenMM — Physics-based molecular dynamics simulations. More accurate for specific timescales but orders of magnitude slower.
- Boltz-2 — Alternative structure prediction supporting proteins, nucleic acids, and ligands.
Applications
- Protein dynamics — Identify flexible loops, hinge regions, and mobile domains without running expensive MD simulations
- Cryptic pocket discovery — Find binding sites that only appear in certain conformational states, revealing druggable pockets hidden in static structures
- Ensemble docking — Generate receptor conformations for docking against multiple protein states, improving virtual screening hit rates
- Allosteric analysis — Study correlated motions and communication pathways between distant sites
- Structure validation — Assess whether a predicted structure represents a single state or exists within a conformational ensemble
- MD proxy — Rapidly generate ensemble properties that converge faster than replicate MD trajectories for certain observables
