
MDGen
AI-powered molecular dynamics trajectory generation
MDGen generates molecular dynamics trajectories using generative AI rather than physics-based simulation. Given a single protein structure, it produces a sequence of conformations representing how the protein might move over time—achieving speedups of 10–1000× compared to traditional MD while preserving key dynamic properties.
The model learns from molecular dynamics simulation data to capture realistic protein motions. Unlike physics-based simulators that integrate equations of motion at femtosecond timesteps, MDGen directly generates trajectory frames, making it practical to explore conformational ensembles in seconds rather than days.
MDGen frames trajectory generation as a conditional generative modeling problem. The model is trained on molecular dynamics simulation data and learns to generate plausible time evolutions by conditioning on trajectory frames.
The system uses a Scalable Interpolant Transformer (SiT) as its flow-based generative backbone. This avoids the computationally expensive residue-pair and frame-based architectures common in protein structure prediction. To handle long trajectories, MDGen incorporates the Hyena long-context architecture, enabling scaling to trajectories of 100,000+ frames.
Proteins are represented in the atom14 format (14 atoms per residue) and converted to SE(3) rigid frames (translation + rotation) plus torsion angles. This representation captures both backbone geometry and sidechain conformations.
MDGen provides checkpoints trained on different datasets:
The generative approach enables multiple tasks through different conditioning strategies:
| Task | Description |
|---|---|
| Forward simulation | Generate trajectory from an initial structure |
| Transition path sampling | Given start and end states, sample plausible connecting paths |
| Trajectory upsampling | Increase temporal resolution of existing trajectories |
| Inpainting | Generate partial molecular dynamics conditioned on fixed regions |
ProteinIQ hosts MDGen on GPU infrastructure with pre-loaded model weights, generating trajectories directly in the browser.
| Input | Description |
|---|---|
Protein Structure | PDB file, mmCIF file, or PDB ID (e.g., 1AKI). Maximum 1,000 residues. |
| Setting | Description |
|---|---|
Number of frames | Trajectory length (10–100, default 50). More frames capture longer timescale dynamics but increase computation. |
Sampling temperature | Diversity control (0.1–2.0, default 1.0). Lower = more conservative motions, higher = more exploration. |
| Setting | Description |
|---|---|
Frame stride | Save every Nth frame (1–10, default 1). Higher values reduce output size. |
Random seed | Fixed seed for reproducibility. Leave empty for random sampling each run. |
MDGen produces a trajectory viewable in the integrated 3D viewer:
| Output | Description |
|---|---|
| Topology PDB | Reference structure with atom connectivity information |
| Trajectory XTC | Compressed trajectory file containing all frames |
| RMSD metrics | Average and maximum backbone deviation from the starting structure |
The viewer supports playback controls, frame-by-frame navigation, and structure alignment.
MDGen excels at rapid conformational exploration when physical accuracy is less critical than speed:
| Use case | MDGen | Traditional MD |
|---|---|---|
| Quick conformational screening | Fast sampling across multiple proteins | Computationally prohibitive |
| Qualitative dynamics exploration | Reasonable ensemble diversity | Higher accuracy needed |
| Large-scale studies | Practical for hundreds of proteins | Resource-intensive |
| Binding site flexibility | Rapid estimate of accessible conformations | Detailed energetics needed |
For applications requiring accurate free energy estimates, specific timescale information, or force field validation, physics-based MD remains the appropriate choice.
MDGen is designed for research exploration and has several constraints:
Root-mean-square deviation measures how much the structure changes from the starting conformation:
| RMSD (nm) | Interpretation |
|---|---|
| < 0.1 | Minimal backbone motion, local fluctuations only |
| 0.1–0.3 | Moderate conformational change, typical for stable proteins |
| 0.3–0.5 | Significant rearrangement, loop movements or domain shifts |
| > 0.5 | Large-scale conformational change |
Evaluate generated trajectories by checking: