PDB Fixer is an open-source tool developed as part of the OpenMM molecular simulation toolkit. It automatically repairs Protein Data Bank files in both PDB and mmCIF/PDBx formats, fixing structural problems that prevent successful molecular dynamics simulations.
ProteinIQ's web implementation now supports batch structure repair: a single job can process up to 10 uploaded or fetched structures with one shared settings block, then return one fixed PDB per input together with a batch summary.
Experimental protein structures from X-ray crystallography or cryo-EM frequently contain issues: hydrogen atoms aren't resolved, side chains or entire loops may be missing due to flexibility, modified residues need conversion, and crystallization artifacts (buffers, salts, duplicate chains) clutter the file. These problems cause simulation software like GROMACS, Amber, or OpenMM to fail.
PDB Fixer addresses eight primary structural problems:
After fixing your structure, use PDB Viewer to visually inspect the results or PDB to FASTA to extract sequences.
PDB Fixer operates as a stepwise pipeline that analyzes molecular topology and atomic coordinates. It uses residue templates from the PDB Chemical Component Dictionary to identify what atoms should exist and add any that are missing.
For each residue, PDB Fixer compares the atoms present against the expected template. Missing atoms are placed using ideal bond lengths, angles, and dihedral values from the template library. This ensures chemically reasonable starting positions.
After adding atoms, the tool runs brief energy minimization using OpenMM's force fields. This resolves steric clashes between newly added atoms and the existing structure while preserving experimentally determined coordinates as much as possible.
When adding hydrogens, PDB Fixer assigns protonation states based on the specified pH. Titratable residues (histidine, aspartate, glutamate, lysine) adopt appropriate charge states. Special cases like disulfide bonds and histidine tautomers are handled automatically.
Structures exceeding 10,000 atoms are automatically routed to GPU-accelerated processing (NVIDIA T4 with CUDA-enabled OpenMM). Energy minimization during atom placement is the main computational bottleneck, and GPU offloading reduces processing time significantly for large complexes. Smaller structures are processed on CPU, where overhead is negligible and startup is faster. The routing is transparent — no configuration is needed.
Missing residues are reconstructed using fragment-based loop building followed by minimization. The algorithm uses SEQRES records in the PDB file to identify which residues should exist, then builds them with idealized geometry before refining with energy minimization.
Upload one or more structure files or enter one or more RCSB PDB IDs (e.g., 1HSG for HIV-1 protease). Both PDB (.pdb, .ent) and mmCIF/PDBx (.cif, .mmcif, .pdbx) formats are accepted, up to 50 MB per file. The current ProteinIQ interface accepts up to 10 structures per job and applies the same repair settings to every entry in the batch.
Review the default settings under "Core processing." For most simulation preparation, keep heterogens set to Keep all to preserve ligands and ions. If you need to remove specific chains from a multi-chain complex, list their IDs in "Remove chains" (e.g., B, C).
Under "Add options," the defaults enable "Add missing heavy atoms" and "Add missing hydrogens." These are essential for MD simulations. Enable "Add missing residues" only if you need complete loops—this significantly increases processing time for structures with large gaps.
For explicit solvent MD simulations, enable "Add solvent box." Configure the ionic strength (typically 0.15 M for physiological conditions) and select your preferred box geometry. Dodecahedral boxes reduce computational cost compared to cubic boxes.
Click Fix Structure to start processing. Once complete, download the fixed PDB file directly or view it using the integrated PDB Viewer.
Batch mode is designed for independent structure cleanup rather than conformational ensembles. Each input structure is processed separately, and the output list preserves the original input order.
When names are available, ProteinIQ keeps the original file names or RCSB identifiers for each repaired structure. If the incoming label is only a placeholder such as PDB 1, the platform attempts to recover a more useful name from the uploaded file name, the fetched identifier, or the PDB/mmCIF content itself.
Many researchers prepare structures manually using tools like PyMOL, Chimera, or Swiss-PdbViewer. PDB Fixer offers several advantages for routine preparation tasks.
| Feature | PDB Fixer | Manual preparation |
|---|---|---|
| Input formats | PDB and mmCIF/PDBx | Varies by software |
| Missing atoms | Automatic detection and placement | Requires scripting or plugin |
| Hydrogens | pH-dependent protonation states | Often uniform or default states |
| Loop building | Automated with minimization | Requires homology modeling tools |
| Large structures | GPU-accelerated (>10k atoms) | CPU-only in most tools |
| Reproducibility | Deterministic with fixed seed | Depends on operator |
| Processing time | Minutes (GPU for large structures) | Hours for complex cases |
| Learning curve | Minimal | Requires software expertise |
PDB Fixer is ideal for batch processing and routine simulation setup. For complex cases requiring manual intervention (e.g., unusual ligands, specific protonation states, or membrane positioning), use PDB Fixer as a starting point and refine manually.
Upload a PDB file (.pdb, .ent) or mmCIF/PDBx file (.cif, .mmcif, .pdbx), or fetch directly from RCSB using a PDB ID. Maximum file size is 50 MB per file, and up to 10 structures can be included in one job. mmCIF content is detected automatically by looking for data_ headers or _atom_site. tokens.
| Input mode | Description |
|---|---|
Upload | Accepts up to 10 structure files in PDB or mmCIF/PDBx format under one input card. |
RCSB fetch | Accepts multiple PDB identifiers in one batch submission and repairs each entry separately. |
Batch settings | All structures in the same job use one shared settings configuration. |
Keep all preserves ligands, ions, and water from the original structure. Keep only water removes ligands and ions but preserves crystallographic waters. Remove all strips everything except the protein chains.B, C, D). Useful for extracting a single chain from a multi-chain complex.CHAIN:ORIGINAL-POSITION-NEW (e.g., A:ALA-57-GLY, B:VAL-123-ILE). This lets you study mutant variants without manual editing.These settings control what structural elements PDB Fixer adds to your structure.
Adding explicit water creates a simulation-ready system. The protein is surrounded by a rectangular or truncated box of water molecules with counterions for charge neutralization.
Na+/Cl- is standard for most simulations. Choose K+ for systems where potassium is physiologically relevant.0.15 M matches physiological conditions. Increase for high-salt studies (halophiles, aggregation).Cubic is standard and compatible with all software. Dodecahedron and Octahedron reduce water count by ~30% while maintaining minimum distance from protein to box edge—faster simulations with equivalent accuracy.TIP3P is the standard choice for most force fields. TIP4P-Ew provides improved density and diffusion properties. SPC/E is popular for GROMACS workflows.Membrane systems embed the protein in a lipid bilayer for studying membrane proteins (GPCRs, ion channels, transporters).
POPC (palmitoyl-oleoyl-phosphatidylcholine) is the most common choice for general membrane protein studies. POPE is preferred for bacterial membranes.0.0 for automatic centering. Adjust when you need the protein positioned asymmetrically in the membrane.AMBER14 is recommended for most use cases. CHARMM36 is preferred if you plan to run simulations with CHARMM-compatible software.0 for random initialization. Fixed seeds ensure identical output when reprocessing the same structure with the same settings.ATP, GTP, HEM). Use this for non-standard residues not included in the default template library.PDB Fixer outputs corrected PDB files ready for molecular dynamics simulation. In batch jobs, ProteinIQ emits one primary repaired PDB for each successful input and an additional summary CSV describing the whole run.
| Metric | Description |
|---|---|
| Atoms | Total atom count in the output structure |
| Residues | Number of residues (may increase if loops were added) |
| Chains | Number of protein chains retained |
| Processing applied | List of fixes performed (e.g., "Added 2,847 hydrogens, replaced 3 nonstandard residues") |
ProteinIQ presents batch PDB Fixer outputs as a collection of structures rather than docking poses. The viewer labels each repaired model by its source name, defaults to showing one structure at a time, and keeps the ligand-specific grouping controls hidden because PDB Fixer is not generating alternate ligand poses.
The secondary batch summary records the source name, success or failure status, output filename, atom and residue counts, applied processing steps, and any error message for structures that could not be repaired. This is useful when a mixed batch contains both successful and failed entries.
We recommend visually inspecting the fixed structure before simulation:
If adding missing residues produces unrealistic loop conformations, consider running energy minimization or brief MD equilibration to refine the structure. Use PDB Viewer to inspect your results directly in the browser or MolProbity for detailed geometry validation.
Start with default settings. For standard simulation preparation, enable "Add missing heavy atoms" and "Add missing hydrogens" while keeping heterogens. This handles most common issues without aggressive modification.
Don't add missing residues unless needed. Loop modeling for large gaps (10+ residues) produces approximate conformations that require substantial equilibration. If the missing region isn't relevant to your study, leave it missing.
Remove heterogens thoughtfully. Crystallographic waters at binding sites can be important. If studying ligand binding, consider keeping all heterogens and manually curating the result.
Use appropriate ionic strength. Physiological conditions typically use 0.15 M NaCl. Zero ionic strength (pure water) can destabilize proteins. Higher concentrations (0.5–1.0 M) are appropriate for halophilic proteins or aggregation studies.
Choose the right box geometry. Dodecahedral boxes reduce computational cost by ~30% compared to cubic boxes with the same minimum padding distance. Use cubic only when your simulation software requires it.
Match water model to force field. If using AMBER14, stick with TIP3P. If using CHARMM36, TIP3P or SPC/E both work well. Mixing incompatible water models with force fields leads to incorrect thermodynamic properties.
Fix the structure with default settings, then use AutoDock Vina or DiffDock for molecular docking studies. For docking, you typically want to remove water (Remove all heterogens) but keep crystallographic ligands for reference.
Enable all "Add" options except "Add missing residues" unless you need complete loops. Add a solvent box with appropriate ionic strength. The output is ready for equilibration in OpenMM, GROMACS, or AMBER.
Structures from ESMFold, Boltz-2, or OpenFold 3 may need protonation or format adjustments. PDB Fixer can add hydrogens and standardize the structure for downstream analysis.
Yes. ProteinIQ provides PDB Fixer online at no cost with a free account. The underlying OpenMM PDBFixer library is open-source software licensed under MIT/LGPL.
Processing time depends on structure size and selected options. Small proteins (under 5,000 atoms) with default settings complete in under a minute. Structures larger than 10,000 atoms are automatically routed to GPU hardware, which accelerates the energy minimization step. Adding missing residues or solvent boxes increases processing time regardless of hardware — expect several minutes for structures with extensive loop building or large solvent boxes.
PDB Fixer builds loops using idealized geometry followed by brief energy minimization. This provides reasonable starting conformations, but long loops (10+ residues) often need additional refinement. Run extended energy minimization or a short MD equilibration to relax the structure.
PDB Fixer adds missing atoms and prepares structures for simulation. PROPKA predicts pKa values for titratable residues without modifying the structure. Use PROPKA first to determine appropriate pH, then run PDB Fixer with that pH value to add hydrogens with correct protonation states.
Yes, with appropriate settings. Enable "Add solvent box" to create a solvated system with counterions. The output PDB includes periodic box vectors and is ready for equilibration in OpenMM, GROMACS, or AMBER.
Check your "Heterogens" setting. If set to Remove all, all non-protein molecules including ligands are deleted. Use Keep all to preserve ligands and crystallographic waters.
Enable "Replace nonstandard residues" to convert modified residues (selenomethionine, phosphoserine, etc.) to their standard equivalents. If you need to keep modifications, disable this option and ensure the residue templates are available—use "Download templates" to fetch specific residue definitions from the PDB Chemical Component Dictionary.
AMBER14 works well for most protein simulations and is the recommended default. Use CHARMM36 if you plan to run simulations with CHARMM-compatible software or if you're studying nucleic acids or specific lipid systems where CHARMM parameters are preferred.
Yes. PDB Fixer processes all chains in the input structure. Use "Remove chains" to exclude specific chains from the output if you only need certain parts of a complex.
Enable "Add lipid membrane" instead of "Add solvent box." Select the appropriate lipid type (POPC for general use, POPE for bacterial membranes) and position the membrane center appropriately for your protein's transmembrane domain.