Molecular dynamics (MD) trajectory analysis is the process of extracting quantitative structural and dynamic information from time-resolved atomic coordinate data produced by MD simulations. After a simulation generates a trajectory, the raw coordinate frames must be post-processed to derive meaningful observables: structural deviations, per-residue flexibility, compactness, hydrogen-bonding patterns, secondary structure evolution, and collective motions. These observables bridge the gap between a trajectory file and biological insight, revealing how a protein folds, unfolds, binds ligands, or transitions between conformational states.
ProteinIQ's MD Trajectory Analysis tool is powered by MDAnalysis, a Python library that reads trajectory data into NumPy arrays for efficient numerical analysis. The tool accepts topology and trajectory files from all major MD engines, including GROMACS, AMBER, CHARMM, NAMD, and OpenMM, and computes up to 24 distinct analyses in a single job.
RMSD (root mean square deviation) measures how far a set of atoms has moved from a reference configuration at each time step. For atoms with positions and reference positions , it is defined as:
A plateau in the RMSD time series indicates that the simulation has reached a stable structural state relative to the reference. Rising or oscillating RMSD suggests ongoing conformational change.
RMSF (root mean square fluctuation) quantifies the time-averaged positional fluctuation of each atom or residue, providing a per-residue flexibility profile rather than a global time series. High RMSF values identify loop regions and flexible termini; low values correspond to the structured core.
Radius of gyration () measures the mass-weighted spatial extent of a protein, acting as a proxy for compactness:
A decrease in over time indicates compaction (e.g., folding), while an increase suggests unfolding or swelling.
Hydrogen bonds are identified using geometric criteria: a donor-acceptor distance below a cutoff (typically 3.0 angstrom) and a donor-hydrogen-acceptor angle above a threshold (typically 150 degrees). Tracking the number and lifetime of hydrogen bonds across frames reveals structural stability and the persistence of specific intramolecular contacts.
Salt bridge analysis monitors electrostatic interactions between oppositely charged residues (e.g., Asp/Glu carboxylates and Arg/Lys amines). Contact analysis counts the number of atom pairs within a specified distance cutoff, capturing the formation and dissolution of interfaces.
Principal component analysis (PCA) diagonalizes the covariance matrix of atomic displacements to extract the dominant collective motions, often referred to as essential dynamics. The first few principal components typically capture large-amplitude motions such as domain opening, hinge bending, and loop rearrangements, while higher components describe thermal noise.
Conformational clustering groups trajectory frames by structural similarity, identifying distinct conformational states sampled during the simulation. This is valuable for understanding populations and transitions between metastable states.
Dynamic cross-correlation maps reveal correlated and anticorrelated motions between residue pairs, highlighting allosteric communication pathways and rigid-body domain motions.
Mean square displacement (MSD) characterizes the translational diffusion of molecules or groups of atoms. The slope of MSD versus time is proportional to the diffusion coefficient via the Einstein relation.
DSSP assigns secondary structure (helix, sheet, coil) to each residue at every frame, producing a residue-by-time map that visualizes structural transitions such as helix unwinding or beta-sheet formation.
Ramachandran analysis tracks backbone dihedral angles (, ) over time, while chi angle analysis monitors side-chain rotamer states. Helix geometry analysis extracts parameters like rise per residue and twist angle for helical segments.
SASA (solvent accessible surface area) measures the surface area exposed to solvent at each frame. Changes in SASA can indicate burial of hydrophobic residues upon folding or exposure upon unfolding.
Radial distribution function (RDF) describes how particle density varies as a function of distance from a reference atom, providing information about solvation shell structure. Density profiles project atom distributions along an axis, useful for membrane systems.
ProteinIQ provides browser-based MD trajectory analysis without requiring local installation of MDAnalysis or Python. Upload topology and trajectory files, select analyses, and download spreadsheet results.
| Input | Description | Accepted formats | Max size |
|---|---|---|---|
Topology/Structure file | Defines atom names, residue identities, and connectivity. Required for interpreting trajectory coordinates. | .pdb, .gro, .psf, .prmtop, .top | 50 MB |
Trajectory file | Contains atomic coordinates for each saved frame of the simulation. | .xtc, .trr, .dcd, .nc, .tng | 500 MB |
The topology and trajectory files must correspond to the same system. Common pairings include .gro + .xtc (GROMACS), .prmtop + .nc (AMBER), and .psf + .dcd (CHARMM/NAMD).
| Setting | Description | Default |
|---|---|---|
Analyses to run | Checkbox selection of which analyses to perform. Includes structural metrics (RMSD, RMSF, Rg, end-to-end distance, shape parameters, moment of inertia), interaction analyses (hydrogen bonds, salt bridges, contacts, distance tracking, H-bond lifetime), conformational analyses (PCA, clustering, MSD, dynamic cross-correlation, persistence length), secondary structure analyses (DSSP, Ramachandran, chi angles, helix geometry), and environment analyses (SASA, RDF, density profiles, water dynamics). | RMSD, RMSF, Radius of gyration |
Atom selection | MDAnalysis selection string specifying which atoms to include. Supports boolean operators and keywords such as protein, backbone, name CA, resid 1:100, and resname LYS. | protein and name CA |
| Selection string | Atoms selected |
|---|---|
protein | All atoms belonging to standard amino acid residues |
protein and name CA | Alpha-carbon atoms only (one per residue) |
backbone | Backbone heavy atoms: N, CA, C, O |
resid 1:100 | All atoms in residues 1 through 100 |
resname LYS ARG | All atoms in lysine and arginine residues |
protein and not name H* | All non-hydrogen protein atoms |
Results are returned as a downloadable spreadsheet. The specific columns depend on which analyses are selected.
| Analysis | Output columns | Description |
|---|---|---|
| RMSD | Frame, Time, RMSD | RMSD value at each frame in angstrom |
| RMSF | Residue, RMSF | Per-residue fluctuation in angstrom |
| Radius of gyration | Frame, Time, Rg | at each frame in angstrom |
A rapidly rising RMSD that plateaus within the first few nanoseconds generally indicates that the protein has relaxed from its starting configuration into a stable state. RMSD values of 1-3 angstrom for C-alpha atoms are typical for stable globular proteins. Values exceeding 5 angstrom often indicate large-scale conformational rearrangements, partial unfolding, or that the simulation has not yet converged.
Because RMSD is degenerate (different conformations can produce the same RMSD relative to a reference), it should be interpreted alongside other metrics. A stable RMSD paired with increasing could indicate rigid-body domain separation without internal structural change.
RMSF values directly correspond to crystallographic B-factors and can be compared with experimental data. Residues with RMSF above 2-3 angstrom are highly flexible and often correspond to loop regions, termini, or intrinsically disordered segments. Residues in the hydrophobic core and secondary structure elements typically show RMSF below 1 angstrom.
For a folded, globular protein, remains relatively constant throughout the simulation. A systematic decrease in may indicate compaction or folding, while an increase suggests unfolding. For intrinsically disordered proteins, fluctuations are expected and their distribution provides information about the ensemble of conformations sampled.
The first two or three principal components typically capture 50-80% of the total variance in well-behaved simulations. Projection of the trajectory onto these components reveals the dominant conformational motions. If frames cluster into distinct groups in PC space, the protein visits multiple conformational substates. Overlap between clusters suggests conformational interconversion on the simulation timescale.
A stable intramolecular hydrogen bond network correlates with structural integrity. A sudden drop in hydrogen bond count often precedes unfolding events. Salt bridge persistence above 50-60% of trajectory frames suggests a structurally important electrostatic interaction; transient salt bridges (below 20%) may contribute to conformational flexibility without being essential for stability.
Reference frame |
The frame used as the reference structure for RMSD and alignment. Options: First frame, Last frame, or Average structure. |
First frame |
Frame step | Analyze every nth frame. Increasing this value reduces computation time for long trajectories at the cost of temporal resolution. | 1 |
Align trajectory | Whether to superimpose each frame onto the reference before computing RMSD. Alignment removes rigid-body translation and rotation, isolating internal structural changes. | Enabled |
| End-to-end distance | Frame, Time, Distance | Distance between first and last residue |
| Hydrogen bonds | Frame, Time, Count | Number of hydrogen bonds per frame |
| PCA | Frame, PC1, PC2, ... | Projection onto principal components |
| DSSP | Residue, Frame, SS | Secondary structure assignment per residue per frame |
| SASA | Frame, Time, SASA | Solvent accessible surface area per frame |