Human Ubiquitin (76 aa) - Flexible regions
AlphaFlow is a generative AI model for predicting protein conformational ensembles — diverse sets of 3D structures representing the different shapes a protein can adopt. Developed by Bowen Jing, Bonnie Berger, and Tommi Jaakkola at MIT, AlphaFlow repurposes high-accuracy structure predictors like AlphaFold and ESMFold by fine-tuning them with a flow matching objective to generate multiple conformations rather than a single static prediction.
Proteins are not rigid molecules. They flex, breathe, and transition between conformational states to perform biological functions such as enzyme catalysis, signal transduction, and molecular recognition. Traditional structure prediction tools like AlphaFold2 produce a single structure, which may not capture this inherent flexibility. AlphaFlow addresses this limitation by generating conformational ensembles that reveal protein dynamics, flexible regions, and alternative binding states — information critical for understanding protein function and designing therapeutics.
ProteinIQ provides a web-based interface for running AlphaFlow without command-line installation or MSA generation. Upload a protein sequence, select a model variant, and receive multiple 3D structures representing the conformational ensemble.
| Input | Description |
|---|---|
Protein sequence | The amino acid sequence to model. Upload a FASTA file, enter a raw sequence, or fetch from RCSB using a PDB ID (e.g., 1UBQ). Maximum sequence length depends on available compute resources. |
Job name | Optional identifier to organize results when running multiple ensemble generations. |
| Setting | Description |
|---|---|
Model variant | The AlphaFlow model to use. ESMFlow MD (default, recommended) generates molecular dynamics-like ensembles from single sequences. ESMFlow PDB models experimental ensemble diversity. AlphaFlow MD and AlphaFlow PDB use MSA-based predictions (auto-generated) for potentially higher accuracy at increased runtime. |
Number of samples | Number of conformations to generate (1–50, default 10). More samples provide better ensemble coverage but increase runtime linearly. |
| Setting | Description |
|---|---|
Inference steps | Number of denoising steps during generation (5–50, default 10). More steps improve sample quality at the cost of longer runtime. |
Diversity (tmax) | Controls the diversity of generated structures. 1.0 (default) produces maximum diversity. 0.75 provides balanced output. 0.5 generates more conservative, structurally similar conformations. |
The output includes multiple PDB structures displayed in an interactive 3D viewer. Each structure represents a distinct conformation from the predicted ensemble.
AlphaFlow transforms deterministic structure predictors into generative models using flow matching, a technique for learning continuous transformations between probability distributions.
The method learns to transform random noise into protein structures through a continuous denoising process. During training, structures are interpolated as:
where is sampled from a harmonic prior (random noise) and is the ground-truth structure. The parameter represents progress along this interpolation — is pure noise and is the clean structure. The model learns to predict the denoised structure at each step using the Frame Aligned Point Error (FAPE) loss, which is appropriate for SE(3)-invariant protein representations.
At inference time, the model iteratively refines a noisy starting structure through multiple denoising steps, generating a new sample with each run.
AlphaFlow provides four model variants combining two base architectures with two training datasets:
| Architecture | Description |
|---|---|
| AlphaFlow | Built on AlphaFold, uses multiple sequence alignments (MSAs) for evolutionary context. Higher accuracy but requires MSA generation (~minutes). |
| ESMFlow | Built on ESMFold, uses single sequences only. Faster inference with no MSA required, suitable for proteins lacking homologs. |
| Dataset | Description |
|---|---|
| PDB | Trained on 1.28M structures from the Protein Data Bank. Models experimental conformational diversity from X-ray crystallography and cryo-EM — different crystal forms, ligand-bound states, and pH conditions. |
| MD | Trained on molecular dynamics trajectories from the ATLAS dataset at 300K. Models thermal fluctuations and dynamic motions at physiological temperature. Captures flexibility, transient contacts, and solvent exposure patterns. |
The MD-trained models are particularly useful for studying protein dynamics, while PDB-trained models better represent experimentally observed conformational diversity.
AlphaFlow adds a custom input module before the folding trunk that:
This allows the deterministic predictor to function as a denoising model with minimal architectural changes.