AlphaFlow

AI-powered protein conformational ensemble generation

Input

Job name

Protein sequence

Model settings

Model variant

Number of samples

60 credits

Output

Configure input settings, then click "Submit"

What is AlphaFlow?

AlphaFlow is a generative AI model for predicting protein conformational ensembles — diverse sets of 3D structures representing the different shapes a protein can adopt. Developed by Bowen Jing, Bonnie Berger, and Tommi Jaakkola at MIT, AlphaFlow repurposes high-accuracy structure predictors like AlphaFold and ESMFold by fine-tuning them with a flow matching objective to generate multiple conformations rather than a single static prediction.

Proteins are not rigid molecules. They flex, breathe, and transition between conformational states to perform biological functions such as enzyme catalysis, signal transduction, and molecular recognition. Traditional structure prediction tools like AlphaFold2 produce a single structure, which may not capture this inherent flexibility. AlphaFlow addresses this limitation by generating conformational ensembles that reveal protein dynamics, flexible regions, and alternative binding states — information critical for understanding protein function and designing therapeutics.

How to use AlphaFlow online

ProteinIQ provides a web-based interface for running AlphaFlow without command-line installation or MSA generation. Upload a protein sequence, select a model variant, and receive multiple 3D structures representing the conformational ensemble.

Inputs

Input	Description
`Protein sequence`	The amino acid sequence to model. Upload a FASTA file, enter a raw sequence, or fetch from RCSB using a PDB ID (e.g., `1UBQ`). Maximum sequence length depends on available compute resources.
`Job name`	Optional identifier to organize results when running multiple ensemble generations.

Settings

Model settings

Setting	Description
`Model variant`	The AlphaFlow model to use. `ESMFlow MD` (default, recommended) generates molecular dynamics-like ensembles from single sequences. `ESMFlow PDB` models experimental ensemble diversity. `AlphaFlow MD` and `AlphaFlow PDB` use MSA-based predictions (auto-generated) for potentially higher accuracy at increased runtime.
`Number of samples`	Number of conformations to generate (1–50, default 10). More samples provide better ensemble coverage but increase runtime linearly.

Advanced settings

Setting	Description
`Inference steps`	Number of denoising steps during generation (5–50, default 10). More steps improve sample quality at the cost of longer runtime.
`Diversity (tmax)`	Controls the diversity of generated structures. `1.0` (default) produces maximum diversity. `0.75` provides balanced output. `0.5` generates more conservative, structurally similar conformations.

Results

The output includes multiple PDB structures displayed in an interactive 3D viewer. Each structure represents a distinct conformation from the predicted ensemble.

Interpreting ensembles

Flexible regions — Areas where structures differ significantly indicate intrinsically flexible parts of the protein
Core stability — Regions that remain consistent across samples represent the stable structural core
Conformational states — Distinct clusters of similar structures may represent different functional states
RMSD analysis — Root-mean-square deviation between ensemble members quantifies structural diversity

How does AlphaFlow work?

AlphaFlow transforms deterministic structure predictors into generative models using flow matching, a technique for learning continuous transformations between probability distributions.

Flow matching framework

The method learns to transform random noise into protein structures through a continuous denoising process. During training, structures are interpolated as:

$x_t = (1-t) \cdot x_0 + t \cdot x_1$

where $x_0$ is sampled from a harmonic prior (random noise) and $x_1$ is the ground-truth structure. The parameter $t \in [0,1]$ represents progress along this interpolation — $t=0$ is pure noise and $t=1$ is the clean structure. The model learns to predict the denoised structure at each step using the Frame Aligned Point Error (FAPE) loss, which is appropriate for SE(3)-invariant protein representations.

At inference time, the model iteratively refines a noisy starting structure through multiple denoising steps, generating a new sample with each run.

Model variants

AlphaFlow provides four model variants combining two base architectures with two training datasets:

Base architectures

Architecture	Description
AlphaFlow	Built on AlphaFold, uses multiple sequence alignments (MSAs) for evolutionary context. Higher accuracy but requires MSA generation (~minutes).
ESMFlow	Built on ESMFold, uses single sequences only. Faster inference with no MSA required, suitable for proteins lacking homologs.

Training datasets

Dataset	Description
PDB	Trained on 1.28M structures from the Protein Data Bank. Models experimental conformational diversity from X-ray crystallography and cryo-EM — different crystal forms, ligand-bound states, and pH conditions.
MD	Trained on molecular dynamics trajectories from the ATLAS dataset at 300K. Models thermal fluctuations and dynamic motions at physiological temperature. Captures flexibility, transient contacts, and solvent exposure patterns.

The MD-trained models are particularly useful for studying protein dynamics, while PDB-trained models better represent experimentally observed conformational diversity.

Input embedding modifications

AlphaFlow adds a custom input module before the folding trunk that:

Accepts noisy beta-carbon coordinates as supplementary input
Computes inter-residue distance histograms (39 bins from 3.25–50.75 Å)
Encodes the time step $t$ using Gaussian Fourier features
Processes through triangle attention operations

This allows the deterministic predictor to function as a denoising model with minimal architectural changes.

Limitations

Timescale constraints — Cannot capture slow conformational changes beyond the timescales present in training MD simulations (microseconds)
Single-chain focus — Primarily designed for monomeric proteins; multimer ensembles are not directly supported
Sequence length — Computational cost scales with sequence length; very long proteins (>1000 residues) may require significant runtime
Backbone-only prediction — The model predicts Cβ distributions; all-atom coordinates are filled in via a single forward pass without iterative refinement
MSA dependency — AlphaFlow variants require MSA generation, which adds preprocessing time compared to ESMFlow

AlphaFold2 — Single-structure prediction with MSA. Use when a single high-confidence structure is sufficient.
ESMFold — Fast single-sequence structure prediction. ESMFlow's base architecture.
OpenMM — Physics-based molecular dynamics simulations. More accurate for specific timescales but orders of magnitude slower.
Boltz-2 — Alternative structure prediction supporting proteins, nucleic acids, and ligands.

Applications

Protein dynamics — Identify flexible loops, hinge regions, and mobile domains without running expensive MD simulations
Cryptic pocket discovery — Find binding sites that only appear in certain conformational states, revealing druggable pockets hidden in static structures
Ensemble docking — Generate receptor conformations for docking against multiple protein states, improving virtual screening hit rates
Allosteric analysis — Study correlated motions and communication pathways between distant sites
Structure validation — Assess whether a predicted structure represents a single state or exists within a conformational ensemble
MD proxy — Rapidly generate ensemble properties that converge faster than replicate MD trajectories for certain observables

AlphaFlow

Input

Advanced settings

Output

What is AlphaFlow?

How to use AlphaFlow online

Inputs

Settings

Model settings

Advanced settings

Results

Interpreting ensembles

How does AlphaFlow work?

Flow matching framework

Model variants

Base architectures

Training datasets

Input embedding modifications

Limitations

Related tools

Applications

Input

Advanced settings

Output