SigmaDock

Fragment-based SE(3) equivariant diffusion docking for protein-ligand binding pose prediction

100
Configure input settings on the left, then click "Submit"orLoad an example (it's free)

HIV Protease + Indinavir

EGFR + Erlotinib

ABL Kinase + Imatinib

Related tools

SurfDock

SurfDock

SurfDock is a surface-informed diffusion generative model for protein-ligand docking, published in Nature Methods 2024. It leverages protein surface geometry to guide a diffusion process for reliable and accurate protein-ligand complex prediction.

DiffDock-L

DiffDock-L

DiffDock-L is a state-of-the-art molecular docking tool that uses diffusion models to predict how small molecule ligands bind to protein targets. It generates multiple binding poses with confidence scores.

DynamicBind

DynamicBind

DynamicBind is an AI-powered protein-ligand binding prediction tool that recovers ligand-induced conformational changes from unbound protein structures. It predicts both ligand binding poses and protein conformational changes.

AutoDock-GPU

AutoDock-GPU

GPU-accelerated molecular docking using the AutoDock4 force field. Up to 56x faster than serial AutoDock via CUDA parallelization of the Lamarckian Genetic Algorithm.

AutoDock Vina

AutoDock Vina

AutoDock Vina is a widely-used molecular docking tool that predicts protein-ligand binding modes using physics-based force fields. Fast, reliable, and the gold standard for structure-based drug discovery.

GNINA

GNINA

GNINA is a molecular docking tool that combines traditional physics-based docking with deep learning CNN scoring for protein-small-molecule complexes. It provides accurate binding predictions with confidence scores, optimized for high-throughput virtual screening.

PandaDock

PandaDock

Open-source molecular docking platform using physics-based scoring functions. CPU-optimized algorithms achieve sub-angstrom accuracy (0.014A RMSD) without GPU requirements.

SMINA

SMINA

SMINA is a fork of AutoDock Vina with enhanced scoring functions, custom scoring support, and 10-20x faster minimization. Ideal for scoring function development, pose refinement, and high-performance docking workflows.

ColabDock

ColabDock

ColabDock is a protein-protein docking framework that uses AlphaFold2 to predict complex structures guided by experimental restraints from cross-linking mass spectrometry, NMR, or other sources.

DFMDock

DFMDock

DFMDock (Denoising Force Matching Dock) is a diffusion model that unifies sampling and ranking for protein-protein docking within a single framework. It predicts docked poses for protein-protein complexes from unbound structures using denoising score matching with optional clash force guidance.

What is SigmaDock?

SigmaDock is a molecular docking method that predicts how small molecule ligands bind to protein targets. Developed at the University of Oxford and presented at ICLR 2026, it achieves approximately 80% Top-1 success rate (RMSD <2 Å and PB-valid) on the PoseBusters benchmark — the first deep learning method to surpass classical physics-based docking on this metric.

Rather than treating ligand flexibility through torsional angles, SigmaDock decomposes ligands into rigid-body fragments at their rotatable bonds and learns SE(3) transformations for each fragment. This design improves both the physical validity of generated poses and the model's ability to generalize to proteins not seen during training.

How to use SigmaDock online

ProteinIQ hosts SigmaDock on GPU infrastructure, so no software installation or local GPU is needed. Submit a protein structure and a ligand, and the model generates multiple ranked binding poses within minutes.

Inputs

InputDescription
ProteinPDB file or RCSB PDB ID. Must contain protein atoms; maximum 1000 residues.
LigandSMILES string, SDF file, MOL/MOL2 file, or PubChem CID. Organics only — metal-containing compounds are not supported.

SigmaDock expects a standard protein receptor structure that upstream parsing tools can read cleanly. Receptors with malformed coordinates, unusual fibril assemblies, or heavily modified/non-standard residues may be rejected before docking rather than producing unreliable poses.

Settings

Docking parameters

SettingDescription
Number of posesHow many candidate binding poses to generate (1–40, default 10). More poses increase diversity and the chance of finding the correct binding mode, at the cost of computation time.
Diffusion stepsNumber of SE(3) denoising steps in the diffusion process (10–50, default 30). Higher values refine poses more thoroughly but run slower.
Scoring methodFunction used to rank generated poses. Vinardo (default) is recommended for most applications; Vina is available as an alternative.
Noise scaleControls the diversity-fidelity tradeoff (0–1, default 0). At 0, sampling is deterministic and maximizes pose fidelity. Increasing the value adds stochastic noise at each diffusion step, producing more diverse but potentially less accurate poses.

Binding site

SigmaDock requires a known binding pocket — it is not a blind docking method. By default, the binding site is automatically determined from the co-crystallized ligand in the PDB file. For structures without a bound ligand (e.g., AlphaFold predictions or apo crystal structures), the pocket center can be specified manually.

SettingDescription
Pocket center X/Y/ZCartesian coordinates of the binding pocket center in Ångströms. Leave blank to use automatic detection.

Results

Each run produces a set of ranked poses as SDF files, displayed in a 3D interactive viewer alongside the input protein. A spreadsheet summarizes the key metrics for each pose.

ColumnDescription
RankPose ranking by score (1 = best)
ScoreVinardo or Vina binding affinity in kcal/mol; more negative indicates stronger predicted binding
CNN ScoreGNINA neural network binding confidence (0–1); higher values indicate greater confidence that the pose represents a true binding mode
CNN AffinityGNINA neural network predicted binding affinity in kcal/mol; an independent affinity estimate complementing the physics-based score
PB ValidPoseBusters average validity score (0–1); summarizes 28 checks for bond geometry, steric clashes, and stereochemistry — higher values indicate more physically realistic poses
RMSDRoot-mean-square deviation from the top-ranked pose, useful for gauging pose diversity
FileSDF file for the docked pose

Interpreting scores

The primary Score column reflects the chosen scoring function (Vinardo or Vina). As a general guide for Vinardo/Vina scores:

Affinity (kcal/mol)Interpretation
< −10Very strong binding (sub-nanomolar)
−7 to −10Strong binding (nanomolar range)
−5 to −7Moderate binding
> −5Weak binding

The CNN Score provides an orthogonal assessment: poses where both the physics-based score and the CNN score agree tend to be more reliable. A CNN Score above 0.5 combined with a strong Vinardo affinity is a good indicator. The PB Valid score flags poses that may look energetically favorable but contain geometric distortions — a PB Valid score below 0.7 warrants visual inspection of the pose.

How SigmaDock works

Molecular docking requires sampling a very large conformational space efficiently. Classical methods like Vina grid-search this space using empirical force fields; diffusion models like DiffDock operate in torsional angle space. SigmaDock takes a different route.

Fragment decomposition

A ligand is broken at its rotatable bonds into rigid fragments. Each fragment maintains its fixed internal geometry (bond lengths, angles) throughout the docking process. The docking problem is then reframed as predicting an SE(3) transformation (translation + rotation in 3D space) for each fragment rather than a set of torsional angles.

SE(3) diffusion

The model uses Riemannian diffusion on the SE(3) manifold. During training, fragment poses are progressively randomized via Brownian motion on the translation and rotation spaces. The neural network learns to reverse this process — denoising noisy fragment configurations back into plausible bound poses. At inference, docking starts from randomized fragment poses and iteratively applies the learned denoising steps.

Architecture

SigmaDock extends EquiformerV2, a graph transformer with SO(3)-equivariance, with a hierarchical graph topology using virtual nodes and edges to capture multi-scale protein-ligand interactions. The prediction head is SO(3)-equivariant to avoid local coordinate frame ambiguities. Soft geometric constraints via triangulation distance conditioning enforce chemical validity across fragment junctions throughout the diffusion trajectory.

Inductive biases over scale

SigmaDock intentionally uses a relatively small training set (~19,000 examples) but achieves AlphaFold3-comparable accuracy through well-designed inductive biases: equivariance, fragment rigidity, and geometric constraints. This contrasts with purely data-driven approaches that rely on much larger training corpora.

Performance

On the PoseBusters benchmark with a temporal train/test split:

MethodTop-1 (RMSD <2 Å & PB-valid)
SigmaDock~80%
AlphaFold3comparable
DiffDock-L12–31%
AutoDock Vinalower
GOLDlower

SigmaDock also maintains strong performance on proteins with low sequence similarity to its training set, a common failure mode for deep learning docking methods.

Limitations

  • Not a blind docking method — a binding site must be known. The tool auto-detects it from co-crystallized ligands in the PDB, or coordinates can be supplied manually.
  • Protein backbone is held rigid; conformational changes upon ligand binding (induced fit) are not modeled.
  • Receptor parsing is strict. Standard protein PDBs work best; malformed exports, fibrils, and structures with modified/non-standard residues may need cleanup before submission.
  • Metal ions and metal-coordinating ligands are not supported; the method is designed for organic small molecules.
  • Proteins over 1000 residues must be trimmed to the binding domain before submission.