MDGen

AI-powered molecular dynamics trajectory generation

Input

Job name

Protein Structure

5 credits

Output

Configure input settings, then click "Submit"

What is MDGen?

MDGen generates molecular dynamics trajectories using generative AI rather than physics-based simulation. Given a single protein structure, it produces a sequence of conformations representing how the protein might move over time—achieving speedups of 10–1000× compared to traditional MD while preserving key dynamic properties.

The model learns from molecular dynamics simulation data to capture realistic protein motions. Unlike physics-based simulators that integrate equations of motion at femtosecond timesteps, MDGen directly generates trajectory frames, making it practical to explore conformational ensembles in seconds rather than days.

How MDGen works

MDGen frames trajectory generation as a conditional generative modeling problem. The model is trained on molecular dynamics simulation data and learns to generate plausible time evolutions by conditioning on trajectory frames.

Architecture

The system uses a Scalable Interpolant Transformer (SiT) as its flow-based generative backbone. This avoids the computationally expensive residue-pair and frame-based architectures common in protein structure prediction. To handle long trajectories, MDGen incorporates the Hyena long-context architecture, enabling scaling to trajectories of 100,000+ frames.

Proteins are represented in the atom14 format (14 atoms per residue) and converted to SE(3) rigid frames (translation + rotation) plus torsion angles. This representation captures both backbone geometry and sidechain conformations.

Training data

MDGen provides checkpoints trained on different datasets:

Tetrapeptides: Explicit and implicit solvent simulations of 4-residue peptides, used for method validation
ATLAS: The ATLAS dataset of protein monomer simulations, preprocessed to 400 picosecond intervals, enabling generation for full proteins

Supported tasks

The generative approach enables multiple tasks through different conditioning strategies:

Task	Description
Forward simulation	Generate trajectory from an initial structure
Transition path sampling	Given start and end states, sample plausible connecting paths
Trajectory upsampling	Increase temporal resolution of existing trajectories
Inpainting	Generate partial molecular dynamics conditioned on fixed regions

How to use MDGen online

ProteinIQ hosts MDGen on GPU infrastructure with pre-loaded model weights, generating trajectories directly in the browser.

Input

Input	Description
`Protein Structure`	PDB file, mmCIF file, or PDB ID (e.g., `1AKI`). Maximum 1,000 residues.

Settings

Trajectory parameters

Setting	Description
`Number of frames`	Trajectory length (10–100, default 50). More frames capture longer timescale dynamics but increase computation.
`Sampling temperature`	Diversity control (0.1–2.0, default 1.0). Lower = more conservative motions, higher = more exploration.

Advanced options

Setting	Description
`Frame stride`	Save every Nth frame (1–10, default 1). Higher values reduce output size.
`Random seed`	Fixed seed for reproducibility. Leave empty for random sampling each run.

Output

MDGen produces a trajectory viewable in the integrated 3D viewer:

Output	Description
Topology PDB	Reference structure with atom connectivity information
Trajectory XTC	Compressed trajectory file containing all frames
RMSD metrics	Average and maximum backbone deviation from the starting structure

The viewer supports playback controls, frame-by-frame navigation, and structure alignment.

When to use MDGen vs traditional MD

MDGen excels at rapid conformational exploration when physical accuracy is less critical than speed:

Use case	MDGen	Traditional MD
Quick conformational screening	Fast sampling across multiple proteins	Computationally prohibitive
Qualitative dynamics exploration	Reasonable ensemble diversity	Higher accuracy needed
Large-scale studies	Practical for hundreds of proteins	Resource-intensive
Binding site flexibility	Rapid estimate of accessible conformations	Detailed energetics needed

For applications requiring accurate free energy estimates, specific timescale information, or force field validation, physics-based MD remains the appropriate choice.

Limitations

MDGen is designed for research exploration and has several constraints:

Protein size: Best results for proteins under 256 residues (ATLAS training limit); larger proteins may produce less reliable dynamics
Physical accuracy: Generated trajectories approximate, but do not exactly reproduce, true molecular dynamics
Timescales: The model captures conformational diversity but not absolute timescale information
Ligands and cofactors: Currently supports protein-only trajectories; bound ligands are not handled

Interpreting results

RMSD values

Root-mean-square deviation measures how much the structure changes from the starting conformation:

RMSD (nm)	Interpretation
< 0.1	Minimal backbone motion, local fluctuations only
0.1–0.3	Moderate conformational change, typical for stable proteins
0.3–0.5	Significant rearrangement, loop movements or domain shifts
> 0.5	Large-scale conformational change

Trajectory quality

Evaluate generated trajectories by checking:

Continuity: Frames should show smooth transitions without sudden jumps
Physical plausibility: No steric clashes, bond lengths should remain reasonable
Diversity: Multiple conformational states should be sampled

OpenMM: Physics-based molecular dynamics with explicit solvent and force field control
AlphaFlow: Flow-matching ensemble generation from protein sequences
PDB Viewer: Structure visualization without trajectory playback
RMSD Calculator: Quantitative structural comparison between conformations

MDGen

Input

Trajectory parameters

Advanced options

Output

What is MDGen?

How MDGen works

Architecture

Training data

Supported tasks

How to use MDGen online

Input

Settings

Trajectory parameters

Advanced options

Output

When to use MDGen vs traditional MD

Limitations

Interpreting results

RMSD values

Trajectory quality

Related tools

Input

Trajectory parameters

Advanced options

Output