ProteinIQ
PDBFixer example image

PDBFixer

An OpenMM-based tool for fixing problems in PDB (Protein Data Bank) files, including adding missing atoms, missing residues, and improper formatting.

What is PDB Fixer?

PDB Fixer is an open-source tool developed as part of the OpenMM molecular simulation toolkit. It automatically repairs Protein Data Bank (PDB) files, fixing structural problems that prevent successful molecular dynamics simulations.

Experimental protein structures from X-ray crystallography or cryo-EM frequently contain issues: hydrogen atoms aren't resolved, side chains or entire loops may be missing due to flexibility, modified residues need conversion, and crystallization artifacts (buffers, salts, duplicate chains) clutter the file. These problems cause simulation software like GROMACS, Amber, or OpenMM to fail.

PDB Fixer addresses eight primary structural problems:

  • Missing hydrogens: Adds all hydrogen atoms absent from X-ray structures
  • Missing heavy atoms: Completes side chain atoms in regions with poor electron density
  • Missing terminal atoms: Adds capping atoms at chain ends
  • Missing residues: Builds complete loops in disordered regions
  • Nonstandard residues: Converts modified amino acids (e.g., selenomethionine) to standard equivalents
  • Heterogens: Removes or keeps ligands, ions, and water molecules
  • Alternate locations: Selects single conformations when multiple exist
  • Solvent/membrane: Constructs water boxes or lipid bilayers for explicit solvent simulations

After fixing your structure, use PDB Viewer to visually inspect the results or PDB to FASTA to extract sequences.

How does PDB Fixer work?

PDB Fixer operates as a stepwise pipeline that analyzes molecular topology and atomic coordinates. It uses residue templates from the PDB Chemical Component Dictionary to identify what atoms should exist and add any that are missing.

Template-based correction

For each residue, PDB Fixer compares the atoms present against the expected template. Missing atoms are placed using ideal bond lengths, angles, and dihedral values from the template library. This ensures chemically reasonable starting positions.

Energy minimization

After adding atoms, the tool runs brief energy minimization using OpenMM's force fields. This resolves steric clashes between newly added atoms and the existing structure while preserving experimentally determined coordinates as much as possible.

Protonation states

When adding hydrogens, PDB Fixer assigns protonation states based on the specified pH. Titratable residues (histidine, aspartate, glutamate, lysine) adopt appropriate charge states. Special cases like disulfide bonds and histidine tautomers are handled automatically.

Loop modeling

Missing residues are reconstructed using fragment-based loop building followed by minimization. The algorithm uses SEQRES records in the PDB file to identify which residues should exist, then builds them with idealized geometry before refining with energy minimization.

Inputs & settings

Input requirements

Upload a PDB file (.pdb or .ent) or fetch directly from RCSB using a PDB ID. Maximum file size is 50 MB.

Core processing

  • pH value: Sets protonation states for titratable groups when adding hydrogens. This setting only applies when "Add missing hydrogens" is enabled.
  • Heterogens: Controls handling of non-protein molecules. Keep all preserves ligands, ions, and water from the original structure. Keep only water removes ligands and ions but preserves crystallographic waters. Remove all strips everything except the protein chains.
  • Remove chains: Comma-separated chain IDs to remove (e.g., B, C, D). Useful for extracting a single chain from a multi-chain complex.
  • Apply mutations: Introduce point mutations during processing. Format: CHAIN:ORIGINAL-POSITION-NEW (e.g., A:ALA-57-GLY, B:VAL-123-ILE). This lets you study mutant variants without manual editing.

Add options

These settings control what structural elements PDB Fixer adds to your structure.

  • Add missing residues: Builds entire missing loops from SEQRES records. Enable this when you need complete chains for simulation. Slow for structures with large gaps (10+ residue loops).
  • Add missing heavy atoms: Completes truncated side chains. We recommend keeping this enabled for simulation preparation.
  • Add missing hydrogens: Adds hydrogen atoms at the specified pH. Required for most MD simulations since X-ray structures lack hydrogens.
  • Replace nonstandard residues: Converts modified amino acids (selenomethionine, phosphoserine, etc.) to their standard equivalents. Enable for standard force field compatibility; disable to preserve post-translational modifications.

Solvent box options

Adding explicit water creates a simulation-ready system. The protein is surrounded by a rectangular or truncated box of water molecules with counterions for charge neutralization.

  • Add solvent box: Enables water box construction. Significantly increases processing time and output file size.
  • Cation/Anion: Ion types for charge neutralization. Na+/Cl- is standard for most simulations. Choose K+ for systems where potassium is physiologically relevant.
  • Ionic strength: Molar concentration of added salt. 0.15 M matches physiological conditions. Increase for high-salt studies (halophiles, aggregation).
  • Box geometry: Shape of the periodic boundary box. Cubic is standard and compatible with all software. Dodecahedron and Octahedron reduce water count by ~30% while maintaining minimum distance from protein to box edge—faster simulations with equivalent accuracy.
  • Box sizing: Choose between automatic padding (distance from protein to box edges) or explicit X/Y/Z dimensions in nanometers.
  • Water model: Select the water model for solvent. TIP3P is the standard choice for most force fields. TIP4P-Ew provides improved density and diffusion properties. SPC/E is popular for GROMACS workflows.

Membrane options

Membrane systems embed the protein in a lipid bilayer for studying membrane proteins (GPCRs, ion channels, transporters).

  • Add lipid membrane: Constructs a membrane system. Cannot be combined with solvent box—the membrane system includes water and ions automatically.
  • Lipid type: Composition of the bilayer. POPC (palmitoyl-oleoyl-phosphatidylcholine) is the most common choice for general membrane protein studies. POPE is preferred for bacterial membranes.
  • Membrane center Z: Position of the bilayer center along the Z-axis in nanometers. Set to 0.0 for automatic centering. Adjust when you need the protein positioned asymmetrically in the membrane.
  • Minimum padding: Distance from the protein to the box edges in nanometers.

Advanced options

  • Force field: Select the molecular mechanics force field for atom placement and minimization. AMBER14 is recommended for most use cases. CHARMM36 is preferred if you plan to run simulations with CHARMM-compatible software.
  • Random seed: Set a specific value for reproducible structure generation. Use 0 for random initialization. Fixed seeds ensure identical output when reprocessing the same structure with the same settings.
  • Custom box vectors: Define triclinic box vectors manually instead of using standard box shapes. Specify vectors A, B, and C as comma-separated X,Y,Z components in nanometers.
  • Download templates: Comma-separated residue codes to download from PDB Chemical Component Dictionary (e.g., ATP, GTP, HEM). Use this for non-standard residues not included in the default template library.

Understanding the results

PDB Fixer outputs a corrected PDB file ready for molecular dynamics simulation. The results summary shows what modifications were applied:

MetricDescription
AtomsTotal atom count in the output structure
ResiduesNumber of residues (may increase if loops were added)
ChainsNumber of protein chains retained
Processing appliedList of fixes performed (e.g., "Added 2,847 hydrogens, replaced 3 nonstandard residues")

Validating the output

We recommend visually inspecting the fixed structure before simulation:

  1. Check that added loops adopt reasonable conformations (no severe clashes)
  2. Verify that important ligands weren't accidentally removed
  3. For membrane systems, confirm the protein spans the bilayer correctly

If adding missing residues produces unrealistic loop conformations, consider running energy minimization or brief MD equilibration to refine the structure. Use PDB Viewer to inspect your results directly in the browser.

Best practices

Start with default settings. For standard simulation preparation, enable "Add missing heavy atoms" and "Add missing hydrogens" while keeping heterogens. This handles most common issues without aggressive modification.

Don't add missing residues unless needed. Loop modeling for large gaps (10+ residues) produces approximate conformations that require substantial equilibration. If the missing region isn't relevant to your study, leave it missing.

Remove heterogens thoughtfully. Crystallographic waters at binding sites can be important. If studying ligand binding, consider keeping all heterogens and manually curating the result.

Use appropriate ionic strength. Physiological conditions typically use 0.15 M NaCl. Zero ionic strength (pure water) can destabilize proteins. Higher concentrations (0.5–1.0 M) are appropriate for halophilic proteins or aggregation studies.

Choose the right box geometry. Dodecahedral boxes reduce computational cost by ~30% compared to cubic boxes with the same minimum padding distance. Use cubic only when your simulation software requires it.

Match water model to force field. If using AMBER14, stick with TIP3P. If using CHARMM36, TIP3P or SPC/E both work well. Mixing incompatible water models with force fields leads to incorrect thermodynamic properties.

Common workflows

Structure preparation for docking

Fix the structure with default settings, then use AutoDock Vina or DiffDock for molecular docking studies. For docking, you typically want to remove water (Remove all heterogens) but keep crystallographic ligands for reference.

MD simulation preparation

Enable all "Add" options except "Add missing residues" unless you need complete loops. Add a solvent box with appropriate ionic strength. The output is ready for equilibration in OpenMM, GROMACS, or AMBER.

Predicted structure cleanup

Structures from ESMFold, Boltz-2, or OpenFold 3 may need protonation or format adjustments. PDB Fixer can add hydrogens and standardize the structure for downstream analysis.


Based on: Eastman, P., et al. (2017). OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLOS Computational Biology, 13(7), e1005659. https://doi.org/10.1371/journal.pcbi.1005659