
Physics-based molecular docking for predicting protein-ligand binding modes with binding affinity scores
AutoDock Vina is one of the most widely-used molecular docking programs in computational drug discovery. It predicts how small molecules bind to protein targets and estimates their binding affinity in kcal/mol.
Vina achieves approximately 100-fold speedup compared to AutoDock 4 while improving prediction accuracy. The approach combines a machine learning-optimized scoring function with an efficient global search algorithm, making it the gold standard for structure-based virtual screening.
For ligand preparation and analysis, consider using Lipinski's Rule of Five to assess drug-likeness or ADMET-AI for comprehensive pharmacokinetic predictions before docking.
AutoDock Vina combines a machine learning-optimized scoring function with an efficient global search algorithm. The authors describe their approach as "more of 'machine learning' than directly physics-based in its nature," justified by empirical performance rather than theoretical assumptions.
The scoring function evaluates protein-ligand interactions through several components:
The predicted binding affinity (kcal/mol) is calculated from the intermolecular portion of the lowest-scoring conformation, combined with a torsional penalty based on the number of rotatable bonds.
Vina implements Iterated Local Search with the BFGS (Broyden-Fletcher-Goldfarb-Shanno) quasi-Newton method for local optimization. Unlike earlier genetic algorithm approaches, BFGS uses scoring function gradients to efficiently navigate the conformational landscape.
The global search uses random mutations of position, orientation, and torsion angles, with a Metropolis acceptance criterion to balance exploration and exploitation. Multiple independent runs (controlled by exhaustiveness) start from randomized positions to improve coverage of the search space.
Vina supports parallel execution on multi-core processors. Independent docking runs are distributed across available CPU cores, with benchmarks showing 7.25x speedup on 8-core systems compared to single-threaded execution.
Protein (Receptor): PDB file or RCSB PDB ID. The protein should be properly protonated with missing residues fixed. Use PDB Fixer for automated preparation.
Ligand: SMILES string (simplest for small molecules), SDF, MOL, or PDBQT file. For peptides and complex molecules (>150 atoms), PDBQT format is recommended as it bypasses automatic conversion.
AutoDock Vina 1.2 supports three scoring functions optimized for different use cases:
The original Vina scoring function offers the best balance of speed and accuracy for most applications. We recommend this for general-purpose docking and virtual screening.
Vinardo (Vina RaDii Optimized) was trained on curated datasets using a novel approach that optimizes docking performance rather than just binding affinity correlation. It removes the long-range attraction term and doubles the contribution of hydrophobic interactions compared to Vina.
Use Vinardo when you need improved ranking of compounds in virtual screening campaigns.
The classical AutoDock4 force field is better suited for metalloproteins and systems with metal coordination. It uses a more physics-based approach compared to the empirical Vina function.
Use AutoDock4 when docking to zinc-containing proteins, heme groups, or other metalloenzymes.
The search space defines a 3D box where the ligand can bind.
Auto mode calculates a box covering the entire protein surface. This works well when you don't know the binding site but increases computation time.
Manual mode lets you specify exact center coordinates and dimensions. We recommend keeping the box under 30×30×30 Å unless you also increase exhaustiveness proportionally.
Flexible docking allows specified protein residues to move during docking, modeling induced-fit binding. Enter residues in the format Chain:ResidueName+Number, separated by commas (e.g., A:ARG120,A:TYR135).
Use flexible residues for:
Note that flexible docking significantly increases computation time and search space complexity.
Binding affinity is reported in kcal/mol. Lower (more negative) values indicate stronger predicted binding:
| Range | Interpretation |
|---|---|
| -4 to -6 | Weak binding |
| -6 to -8 | Moderate binding |
| -8 to -10 | Strong binding |
| < -10 | Very strong binding |
Values below -12 kcal/mol may indicate scoring artifacts and should be validated experimentally or with additional computational methods.
Vina generates multiple binding poses ranked by predicted affinity. The top-ranked pose represents the most favorable binding mode, but examining the top 3-5 poses is recommended. Alternative binding modes may be biologically relevant.
Vina achieves ~87% success rate (ligand RMSD < 2Å from crystal structure) on benchmark datasets. Predicted affinities have a standard error of approximately 2.85 kcal/mol, so relative rankings are more reliable than absolute values.
Smina is a Vina fork with additional scoring function options and better command-line interface.
GNINA adds convolutional neural network rescoring for improved pose prediction accuracy.
DiffDock uses a diffusion generative model and excels at blind docking when the binding site is unknown.
We recommend starting with default parameters (exhaustiveness 8, 9 poses) for initial screening.
For publication-quality results, increase exhaustiveness to 16-32 and validate top hits with molecular dynamics or experimental binding assays.
When the binding site is known, define the search space manually. This reduces runtime and focuses sampling on the relevant region.
For complex ligands (peptides, macrocycles, molecules >150 atoms), prepare PDBQT files externally using Meeko or OpenBabel.
Runtime scales with three factors: search space volume, ligand flexibility, and exhaustiveness. Large search boxes (especially Auto mode on big proteins) dramatically increase computation time because Vina must sample more conformational space.
To speed things up: define a manual search box around the known binding site, reduce exhaustiveness for initial screening, or use a smaller ligand. For very large proteins, consider whether blind docking is truly necessary.
Vina only returns poses within the specified energy range of the best pose. If you requested 9 poses but only got 3, the remaining poses exceeded your energy cutoff or were too similar (below the min RMSD threshold).
This often indicates the binding site has limited conformational diversity for your ligand. Try increasing the energy range parameter or decreasing the min RMSD threshold to generate more poses.
Exhaustiveness determines how many independent docking runs Vina performs. Each run starts from a random position and explores the search space using the BFGS optimization algorithm.
Higher exhaustiveness reduces the probability of missing the global energy minimum but increases runtime linearly. Think of it as "how hard should Vina search?" For quick screening, 8 is adequate. For publication or lead optimization, use 16-32.
Predicted affinities below -12 kcal/mol are often scoring artifacts rather than true predictions. Common causes include ligands with many rotatable bonds, very large binding pockets, or ligands that clash with the protein despite appearing to fit.
We recommend treating such results skeptically. Validate with alternative methods like GNINA for CNN-based rescoring, or check the 3D pose visually for unrealistic binding modes.
No. Vina's binding affinity predictions are only meaningful for comparing ligands docked to the same protein structure. The scoring function is calibrated relative to a specific binding site.
Comparing affinities across different proteins—even related family members—is not valid. Each protein-ligand system must be evaluated independently.
Use Vina (default) for most applications—it offers the best speed-accuracy trade-off for general docking. Use Vinardo when ranking compounds in virtual screening campaigns, as it was optimized for pose discrimination rather than absolute affinity prediction.
Use AutoDock4 specifically for metalloproteins (zinc fingers, heme enzymes, metalloenzymes). The classical force field handles metal coordination better than the empirical Vina function.
Yes. AutoDock Vina expects a properly protonated protein with no missing residues in the binding site. Use PDB Fixer to add missing atoms and residues automatically.
Crystal structures often have missing loops, alternate conformations, and no hydrogen atoms. Skipping preparation can lead to failed jobs or unreliable results.
Yes, but with limitations. Vina handles molecules up to about 150 atoms reasonably well. For larger peptides, cyclic peptides, or macrocycles, prepare PDBQT files externally using Meeko to ensure proper torsion handling.
Very flexible ligands (>15 rotatable bonds) exponentially increase search space complexity. Consider increasing exhaustiveness proportionally or using DiffDock, which handles flexibility differently.
Use flexible residues when you know specific active site residues undergo conformational changes upon ligand binding (induced fit). Common candidates include gatekeeper residues, catalytic residues, and residues known to adopt multiple rotamers.
Adding flexibility significantly increases runtime and search complexity. Start with rigid docking, then add flexibility only if poses show steric clashes with known flexible residues or if redocking a crystal ligand fails.
Several factors can cause pose prediction failures: incorrect protonation states, missing water molecules that mediate binding, metal coordination issues, or insufficient exhaustiveness. Vina's ~87% success rate means approximately 1 in 8 dockings will fail to reproduce the native pose.
Try increasing exhaustiveness, using a different scoring function, or using GNINA which adds CNN rescoring for improved pose accuracy.
If you have a reference ligand or crystal structure, extract coordinates from the bound ligand's center of mass. Many PDB structures include co-crystallized ligands—download these and calculate the geometric center.
For novel targets without known ligands, use cavity detection tools or literature to identify likely binding pockets. Consider using Auto mode for initial blind docking, then refine with manual coordinates around the identified site.
GNINA is built on Vina's codebase but adds convolutional neural network (CNN) scoring. It generates poses the same way but rescores them using a model trained to distinguish correct from incorrect binding modes.
Use Vina for speed and established workflows. Use GNINA when pose accuracy is critical and you can tolerate slightly longer runtimes. GNINA often recovers correct poses that Vina's scoring ranks poorly.
DiffDock uses a diffusion generative model rather than physics-based docking. It excels at blind docking when you don't know the binding site, and handles protein flexibility implicitly.
Use DiffDock for exploratory docking on novel targets. Use Vina when you know the binding site, need interpretable scoring, want faster turnaround, or require reproducibility with a specific random seed.