
Integrative protein-protein docking guided by experimental restraints
HADDOCK3 (High Ambiguity Driven protein-protein DOCKing) is an integrative modeling platform for predicting the structure of biomolecular complexes. Unlike traditional docking methods that rely solely on shape complementarity and energy minimization, HADDOCK3 incorporates experimental or bioinformatic data as restraints to guide the docking process.
This data-driven approach makes HADDOCK3 particularly powerful when you have partial information about binding interfaces from sources like mutagenesis experiments, NMR chemical shift perturbations, cross-linking mass spectrometry, or bioinformatic predictions. The platform translates this uncertain information into ambiguous interaction restraints (AIRs) that improve docking accuracy without requiring precise structural details.
HADDOCK3 represents a complete redesign of the HADDOCK platform with a modular architecture. The workflow consists of sequential stages—rigid body docking, semi-flexible refinement, energy minimization, and clustering—each optimized to progressively refine the complex structure while maintaining agreement with experimental data.
The core innovation in HADDOCK3 is the use of Ambiguous Interaction Restraints. An AIR defines a set of possible interactions between residues on the two proteins without specifying exact atomic contacts.
Active residues are amino acids known to be directly involved in the binding interface. These residues are restrained to form contacts with the partner protein, incurring an energy penalty if they remain solvent-exposed.
Passive residues are surface-accessible amino acids near active residues that may contribute to binding. They're allowed but not required to participate in the interface, providing flexibility in the docking search.
An AIR creates an ambiguous distance restraint between any atom of an active residue on protein A and any atom of both active and passive residues on protein B. This ambiguity accommodates the uncertainty inherent in experimental data while still guiding the search toward biologically relevant conformations.
When interface information is uncertain, HADDOCK3 can randomly remove a fraction of restraints during each docking trial. This ensures broader conformational sampling while always satisfying a different subset of the experimental data.
If artificial or incorrect restraints were included, structures satisfying them will score unfavorably and be filtered out during scoring and clustering.
HADDOCK3 uses a multi-stage protocol to generate and refine protein complexes:
Stage 1 - Rigid Body Docking: The proteins are randomized in orientation and subjected to rigid body energy minimization. The sampling parameter controls how many models are generated at this stage—typically 50-200 structures.
Stage 2 - Semi-Flexible Refinement: Top-scoring rigid body models undergo simulated annealing in torsion angle space. Side chains and backbone atoms near the interface are allowed to move, optimizing local contacts while maintaining global geometry.
Stage 3 - Final Refinement: Selected models are refined in explicit solvent using Cartesian coordinates. This stage includes full energy minimization with electrostatics, van der Waals forces, and desolvation terms.
Stage 4 - Clustering: The refined structures are clustered based on interface RMSD. Representative models from each cluster are ranked by HADDOCK score to identify the most likely binding modes.
HADDOCK3 evaluates complexes using a weighted combination of energy terms:
HADDOCK score=1.0⋅Evdw+0.2⋅Eelec+1.0⋅Edesol+0.1⋅EairWhere:
Lower HADDOCK scores indicate better-predicted complexes. The heavy weighting on van der Waals and desolvation terms reflects their importance in protein-protein recognition, while the reduced weight on electrostatics accounts for its longer-range nature and potential artifacts.
HADDOCK3 requires two protein structures in PDB format. You can upload structures or fetch them directly from the RCSB PDB database using four-letter codes.
Structures should be prepared beforehand: remove water molecules, heteroatoms not part of the protein, and alternate conformations. Use our PDB Fixer tool if your structures contain common issues like missing atoms or incorrect protonation states.
Number of models: Controls how many structures are generated during the initial rigid body docking stage. Higher values (100-200) increase the chance of sampling the correct binding mode but require more computational time. Start with 50 models for exploratory runs, then increase to 100+ when you have reliable restraints.
Top models to return: Specifies how many final refined complexes to output after clustering and ranking. The top 10 models typically capture the main binding modes, but you may want 20-50 if exploring diverse conformational states.
Defining active and passive residues significantly improves docking accuracy compared to ab initio (unrestrained) docking.
Active residues: Enter residue numbers known to be critical for binding from mutagenesis data, cross-linking experiments, or NMR chemical shift perturbations. Use comma-separated values (38,40,45) or ranges (38-45). These residues will be strongly constrained to form contacts with the partner protein.
Passive residues: Include surface-exposed residues adjacent to active residues. These contribute to the binding interface but are not individually essential. HADDOCK3 allows these residues to participate in contacts without penalizing their absence.
If you don't know the interface residues, leave these fields empty. HADDOCK3 will perform ab initio docking using random AIRs, generating restraints automatically during the search. This mode is less accurate but useful for completely unknown interfaces.
For both receptor and ligand restraints, enter residues as:
15,16,17,48,5115-20,48-5210-15,23,40-45,67Residue numbering must match the PDB file exactly, including chain information if multiple chains are present.
The primary metric is the HADDOCK score, which combines van der Waals energy, electrostatics, desolvation, and AIR violation terms. Lower (more negative) scores indicate better complexes.
Typical HADDOCK scores for good complexes range from -100 to -200, but absolute values depend on complex size and composition. Focus on relative ranking rather than absolute numbers—the top-ranked model is most likely to represent the native binding mode.
HADDOCK3 groups similar structures into clusters based on interface RMSD. Clusters with high populations and low average scores indicate robust binding modes supported by multiple independent docking trials.
If restraints are reliable, the largest cluster usually contains near-native structures. When exploring unknown interfaces (ab initio mode), examine the top 2-3 clusters as multiple binding modes may be plausible.
Always examine the 3D structures of top-ranked models using the integrated viewer. Check that:
Visualize multiple models from the top cluster to assess structural variability at the interface. Low RMSD between cluster members indicates a well-defined binding mode.
HADDOCK3 operates in two modes depending on whether you provide interface restraints.
Data-driven docking uses active and passive residues to guide the search. This mode achieves higher accuracy (success rates >70% in benchmarks) when restraints are correct, and converges faster by limiting the conformational search space. We recommend this approach whenever you have experimental evidence for interface residues.
Ab initio docking generates random AIRs automatically when no restraints are provided. HADDOCK3 creates ambiguous restraints on-the-fly during rigid body docking, sampling a wide range of possible interfaces. Success rates are lower (~30-40%) but this mode is useful for completely unknown complexes or validating experimental predictions.
HADDOCK3 excels at problems where you have partial experimental information:
NMR-based complex modeling: Chemical shift perturbations identify residues affected by binding. Use these as active residues to generate accurate complex structures even with limited signal.
Mutagenesis-guided docking: Mutations that disrupt binding localize the interface. Map these to active residues and use surrounding surface residues as passive restraints.
Cross-linking mass spectrometry integration: Distance constraints from cross-linked residue pairs can be incorporated as additional restraints, further refining the docking search.
Antibody-antigen docking: Use predicted or experimentally determined CDR loops as active residues to dock antibodies onto antigens with improved accuracy.
HADDOCK3 assumes both proteins maintain their bound conformations during docking. The semi-flexible refinement stage allows local adjustments, but large conformational changes upon binding may not be captured.
The method performs best with soluble, globular proteins. Membrane proteins, intrinsically disordered regions, or highly flexible domains may require specialized protocols.
Computational time scales with the number of models generated. Generating 200 rigid body models with full refinement can take 1-2 hours, so plan accordingly for large-scale studies.
For small molecule-protein docking, use AutoDock Vina or SMINA which are optimized for ligand binding. For deep learning-based approaches, GNINA provides CNN-enhanced scoring, while DiffDock uses diffusion models for ligand pose prediction.
LightDock offers an alternative protein-protein docking approach using swarm intelligence optimization. For predicting protein structures before docking, consider ESMFold, Chai-1, or Boltz-2.
Giulini, M. et al. (2025). "HADDOCK3: A modular and versatile platform for integrative modelling of biomolecular complexes." Journal of Chemical Information and Modeling. doi: 10.1021/acs.jcim.5c00969
Dominguez, C., Boelens, R., & Bonvin, A. M. (2003). "HADDOCK: A protein-protein docking approach based on biochemical or biophysical information." Journal of the American Chemical Society, 125(7), 1731-1737. doi: 10.1021/ja026939x
HADDOCK3 User Manual - Official documentation from the Bonvin Lab
HADDOCK Scoring Function - Detailed explanation of energy terms