ProteinIQ
MolProbity example image

MolProbity

Validate protein structure quality with all-atom contact analysis and geometry checks.

What is MolProbity?

MolProbity is a comprehensive structure validation tool that assesses protein and nucleic acid quality through all-atom geometry analysis. Rather than just checking if atoms fit the electron density map, MolProbity examines whether atoms are physically positioned correctly relative to each other—detecting steric clashes, backbone geometry problems, and sidechain conformational errors that even high-resolution crystallography can miss.

Structure validation is essential before using a protein model for downstream analysis or design work. A structure with poor geometry may look reasonable in electron density but contain atomic overlaps or distortions that lead to incorrect functional predictions. MolProbity combines multiple quality metrics into a single, interpretable framework that catches both obvious errors and subtle geometric inconsistencies.

MolProbity produces detailed validation reports with flagged residues, quality scores at multiple resolutions, and automated feedback about structure reliability. This makes it invaluable during structure refinement and for establishing confidence in predicted models from AlphaFold2, Boltz-2, or experimental methods.

How does MolProbity work?

MolProbity performs several independent geometric analyses that together provide a comprehensive view of structure quality. Each metric targets different types of structural problems.

All-atom clash detection

The clashscore measures severe steric overlaps where nonbonded atoms come within 0.4 Å of each other—penetrating well into forbidden van der Waals space. These aren't minor geometric strains but actual atomic collisions that signal local fitting errors or mistakes in atomic coordinate assignment.

MolProbity counts all such clashes and normalizes to per-thousand atoms, making scores comparable across different protein sizes. A clashscore of 0 is excellent; clashscores above 10 indicate problematic regions. Well-built structures typically average well below 5 per thousand atoms.

Ramachandran backbone analysis

The backbone of a protein is defined by three dihedral angles: phi (φ), psi (ψ), and omega (ω). Omega is nearly always 180° due to partial double-bond character in the peptide bond. Phi and psi, however, are free to rotate, but only certain combinations avoid atomic clashes.

MolProbity evaluates each residue's φ/ψ angles against a modern reference distribution derived from nearly a million high-quality reference residues. Residues are classified as:

  • Favored: ~98% of observed angles in well-built structures. These conformations are energetically favorable and sterically optimal.
  • Allowed: ~2% of observed angles. Sterically permissible but less common.
  • Outlier: ~0.5% or fewer. Outside expected regions, flagging either modeling errors or unusual but genuine functional conformations.

The tool applies residue-specific distributions because glycine (no sidechain) has much broader allowed regions, while proline (cyclic sidechain) is severely restricted to φ ≈ -60°.

Rotamer sidechain assessment

Sidechains adopt discrete rotameric conformations where chi (χ) dihedral angles cluster around staggered orientations (±60°, 180°). These rotamers are energetically favorable because they minimize clash between the sidechain and backbone.

MolProbity classifies each sidechain rotamer as:

  • Favored: 98% of observed rotamers in quality structures. Standard conformations well-supported by geometry.
  • Allowed: Intermediate conformations, less common but sterically feasible.
  • Outlier: 0.3% or fewer residues. These flag either genuine functional conformations or sidechain placement errors during structure building.

Non-glycine, non-alanine residues are evaluated; glycine has no sidechain and alanine's sidechain is too small to adopt rotamers.

C-beta and backbone geometry

MolProbity checks for C-beta deviations, where the measured C-beta position deviates from ideal geometry predicted from backbone (N, Cα, C) coordinates. Large deviations suggest problems with local backbone geometry or sidechain modeling.

For lower-resolution structures (2.5–4.0 Å), the tool also applies CaBLAM analysis, which evaluates backbone geometry and secondary structure likelihood from the Cα virtual dihedral angles. This provides useful validation feedback even when full-atom detail is uncertain.

MolProbity Score

MolProbity combines clashscore, Ramachandran favored percentage, and rotamer outlier percentage into a single score that approximates the resolution at which such a quality would be average. This allows you to compare your structure against others of similar resolution.

Input requirements

MolProbity accepts protein and nucleic acid structures in:

  • PDB files (.pdb): The standard text format for protein coordinates
  • CIF files (.ent): The newer crystallographic format, recommended for modern structures
  • RCSB PDB IDs: Enter a PDB code directly (e.g., 1UBQ) to automatically fetch the structure from the Protein Data Bank

Your structure must contain valid atomic coordinates. For proteins, all backbone atoms (N, Cα, C, O) should be present for reliable analysis. Structures with missing atoms in the backbone will have reduced validation detail.

Understanding the results

MolProbity returns a spreadsheet with individual metrics for your structure, each with a quality assessment.

Quality status interpretation

Each metric includes a status flag indicating severity:

  • Green/Good: Expected for well-built structures. Your structure meets quality standards for this metric.
  • Yellow/Suboptimal: Minor problems that don't prevent use but suggest local issues. Inspect flagged residues.
  • Red/Problem: Significant deviations requiring attention. These may indicate genuine errors or unusual functional conformations.

Key metrics to examine

Clashscore: Look for values below 5 for high-quality structures, below 10 for acceptable ones. Clashes above 10 per thousand atoms suggest systematic problems in specific regions. If clashes are concentrated in ligand-binding sites or flexible loops, they may reflect real conformational ensembles rather than errors.

Ramachandran outliers: 0–0.5% outliers is excellent; 0.5–1% is normal; above 2% suggests problems. However, functionally important residues (especially in active sites) sometimes adopt strained conformations for catalysis or substrate binding, appearing as outliers.

Rotamer outliers: Similar interpretation—0–0.5% is excellent, up to 1% is typical. Outliers in hydrophobic cores likely represent errors; outliers at protein surfaces or binding sites may be functionally important.

C-beta deviations: Large deviations (>0.5 Å) are rare in well-built structures and typically indicate local backbone strain or atomic position errors.

MolProbity Score: Compare against structures at similar resolution. A score within one percentile point of the median for your resolution range indicates typical quality.

Outlier lists

MolProbity flags specific residues with problems—clashes, Ramachandran outliers, rotamer outliers, and C-beta deviations. Always examine these residues visually in your structure viewer. Some indicate genuine errors requiring correction; others reflect catalytic residues or binding sites under conformational stress.

Use cases and limitations

MolProbity works exceptionally well for crystal structures at 1.5–3.0 Å resolution where atomic positions are well-determined. For cryo-EM structures or low-resolution models, interpretation requires more caution.

Best uses: Validating crystal structures, assessing quality of predicted structures from AlphaFold2 or Boltz-2, identifying problematic regions during refinement, comparing multiple conformational states.

Limitations: At very low resolution (>4 Å), geometric metrics become less informative because electron density ambiguity itself can cause apparent clashes or deviations. For designed proteins or synthetic sequences with no evolutionary precedent, some metrics may be overly strict.

Interpreting predicted structures

When validating structures predicted by AlphaFold2, Boltz-2, or other ML models, expect slightly different patterns than experimental structures. Predicted models often show fewer rotamer outliers (models are smoother) but may have unusual backbone angles in flexible regions. Use MolProbity's detailed feedback to identify regions you should trust versus regions requiring additional analysis.

For comprehensive protein analysis, use our Protein Parameters calculator to examine molecular weight, isoelectric point, stability indices, and other sequence-based metrics alongside MolProbity's structure-based validation.

For detailed backbone angle analysis, Ramachandran Plot provides an interactive visualization of phi/psi distributions specific to your structure.

For structure improvement, PDB Fixer can correct common structural errors and add missing atoms detected by MolProbity.

To visualize your structure and examine flagged residues in 3D, use PDB Viewer.