MDGen

Generate molecular dynamics trajectories from protein structures.

What is MDGen?

MDGen generates molecular dynamics trajectories using generative AI rather than physics-based simulation. Given a single protein structure, it produces a sequence of conformations representing how the protein might move over time—achieving speedups of 10–1000× compared to traditional MD while preserving key dynamic properties.

The model learns from molecular dynamics simulation data to capture realistic protein motions. Unlike physics-based simulators that integrate equations of motion at femtosecond timesteps, MDGen directly generates trajectory frames, making it practical to explore conformational ensembles in seconds rather than days.

How MDGen works

MDGen frames trajectory generation as a conditional generative modeling problem. The model is trained on molecular dynamics simulation data and learns to generate plausible time evolutions by conditioning on trajectory frames.

Architecture

The system uses a Scalable Interpolant Transformer (SiT) as its flow-based generative backbone. This avoids the computationally expensive residue-pair and frame-based architectures common in protein structure prediction. To handle long trajectories, MDGen incorporates the Hyena long-context architecture, enabling scaling to trajectories of 100,000+ frames.

Proteins are represented in the atom14 format (14 atoms per residue) and converted to SE(3) rigid frames (translation + rotation) plus torsion angles. This representation captures both backbone geometry and sidechain conformations.

Training data

MDGen provides checkpoints trained on different datasets:

  • Tetrapeptides: Explicit and implicit solvent simulations of 4-residue peptides, used for method validation
  • ATLAS: The ATLAS dataset of protein monomer simulations, preprocessed to 400 picosecond intervals, enabling generation for full proteins

Supported tasks

The generative approach enables multiple tasks through different conditioning strategies:

TaskDescription
Forward simulationGenerate trajectory from an initial structure
Transition path samplingGiven start and end states, sample plausible connecting paths
Trajectory upsamplingIncrease temporal resolution of existing trajectories
InpaintingGenerate partial molecular dynamics conditioned on fixed regions

How to use MDGen online

ProteinIQ hosts MDGen on GPU infrastructure with pre-loaded model weights, generating trajectories directly in the browser.

Input

InputDescription
Protein StructureProtein-only PDB file, mmCIF file, or PDB ID (e.g., 1AKI). Standard amino acid residues only, one chain, up to 1000 residues.

Settings

Trajectory parameters

SettingDescription
Frames per rolloutFrames generated per rollout (50–1000, default 250). The ATLAS model is trained at 250 frames (400 ps).
RolloutsNumber of autoregressive rollouts (1–10, default 1). Each rollout continues the trajectory from the previous one, multiplying total trajectory length.

Output

MDGen produces a trajectory viewable in the integrated 3D viewer:

OutputDescription
Topology PDBReference structure with atom connectivity information
Trajectory XTCCompressed trajectory file containing all frames
RMSD metricsAverage and maximum backbone deviation from the starting structure

The viewer supports playback controls, frame-by-frame navigation, and structure alignment.

When to use MDGen vs traditional MD

MDGen excels at rapid conformational exploration when physical accuracy is less critical than speed:

Use caseMDGenTraditional MD
Quick conformational screeningFast sampling across multiple proteinsComputationally prohibitive
Qualitative dynamics explorationReasonable ensemble diversityHigher accuracy needed
Large-scale studiesPractical for hundreds of proteinsResource-intensive
Binding site flexibilityRapid estimate of accessible conformationsDetailed energetics needed

For applications requiring accurate free energy estimates, specific timescale information, or force field validation, physics-based MD remains the appropriate choice.

Limitations

MDGen is designed for research exploration and has several constraints:

  • Protein size: The ATLAS model is trained on proteins spanning 38-2128 residues; inputs up to 1000 residues are supported here as a single-GPU compute guardrail
  • Physical accuracy: Generated trajectories approximate, but do not exactly reproduce, true molecular dynamics
  • Timescales: The model captures conformational diversity but not absolute timescale information
  • Ligands and cofactors: Currently supports protein-only trajectories; bound ligands are not handled

Interpreting results

RMSD values

Root-mean-square deviation measures how much the structure changes from the starting conformation:

RMSD (nm)Interpretation
< 0.1Minimal backbone motion, local fluctuations only
0.1–0.3Moderate conformational change, typical for stable proteins
0.3–0.5Significant rearrangement, loop movements or domain shifts
> 0.5Large-scale conformational change

Trajectory quality

Evaluate generated trajectories by checking:

  • Continuity: Frames should show smooth transitions without sudden jumps
  • Physical plausibility: No steric clashes, bond lengths should remain reasonable
  • Diversity: Multiple conformational states should be sampled

Related tools

ABodyBuilder3

ABodyBuilder3

ABodyBuilder3 predicts antibody variable-domain structures from paired heavy and light chain sequences. It returns a PDB structure and, for the pLDDT checkpoint, per-residue confidence values.

AlphaFlow

AlphaFlow

Generate protein conformational ensembles with ESMFlow, the single-sequence AlphaFlow model family. Produces multiple diverse structures showing protein flexibility and dynamics.

AlphaFold2

AlphaFold2

AlphaFold2 via ColabFold for high-accuracy protein structure prediction. Uses MMSeqs2 API for MSA generation with no local databases required. Supports monomer and multimer prediction.

Boltz-2

Boltz-2

Boltz-2 is a biomolecular foundation model for structure and binding affinity prediction. Supports proteins, ligands, DNA, and RNA in multi-component complexes. Automatically scales GPU resources for large complexes. Predicts binding affinity with near-FEP accuracy at 1000x faster speed.

Chai-1

Chai-1

Chai-1 is a multi-modal foundation model for molecular structure prediction. Predicts 3D structures for proteins, ligands, DNA, RNA, and multi-component complexes with high accuracy.

ESMfold

ESMfold

ESMfold is a fast, single-sequence protein structure predictor from Meta AI. Predicts 3D protein structures directly from amino acid sequences without requiring multiple sequence alignments (MSA), making it significantly faster than AlphaFold while automatically scaling GPU resources for larger proteins.

ESMFold2

ESMFold2

ESMFold2 predicts protein structures and multi-chain protein complexes from amino acid sequences using Biohub protein language models. The first ProteinIQ release focuses on sequence-based protein folding with confidence metrics, native mmCIF structures, and optional PAE and pair-chain iPTM outputs.

ImmuneBuilder

ImmuneBuilder

ImmuneBuilder predicts 3D structures of immune receptor proteins including antibodies, nanobodies, and T-cell receptors. It uses ABodyBuilder2, NanoBodyBuilder2, and TCRBuilder2/TCRBuilder2+ to generate structures with per-residue error estimates and optional ensemble artifacts.

IntelliFold 2

IntelliFold 2

Controllable biomolecular structure prediction model for proteins, ligands, DNA, RNA, and multi-component complexes. IntelliFold 2 supports fast v2-Flash inference, optional MSA generation, and ranked confidence outputs.

LMI4Boltz

LMI4Boltz

LMI4Boltz is a low-memory fork of Boltz for biomolecular structure and binding affinity prediction. It preserves Boltz inference behavior while reducing VRAM use with in-place pair updates, CPU offload, reduced precision pair representation, and aggressive chunking.