SigmaDock is a molecular docking method that predicts how small molecule ligands bind to protein targets. Developed at the University of Oxford and presented at ICLR 2026, it achieves approximately 80% Top-1 success rate (RMSD <2 Å and PB-valid) on the PoseBusters benchmark — the first deep learning method to surpass classical physics-based docking on this metric.
Rather than treating ligand flexibility through torsional angles, SigmaDock decomposes ligands into rigid-body fragments at their rotatable bonds and learns SE(3) transformations for each fragment. This design improves both the physical validity of generated poses and the model's ability to generalize to proteins not seen during training.
ProteinIQ hosts SigmaDock on GPU infrastructure, so no software installation or local GPU is needed. Submit a protein structure and a ligand, and the model generates multiple ranked binding poses within minutes.
| Input | Description |
|---|---|
Protein | PDB file or RCSB PDB ID. Must contain protein atoms; maximum 1000 residues. |
Ligand | SMILES string, SDF file, MOL/MOL2 file, or PubChem CID. Organics only — metal-containing compounds are not supported. |
| Setting | Description |
|---|---|
Number of poses | How many candidate binding poses to generate (1–40, default 10). More poses increase diversity and the chance of finding the correct binding mode, at the cost of computation time. |
Diffusion steps | Number of SE(3) denoising steps in the diffusion process (10–50, default 30). Higher values refine poses more thoroughly but run slower. |
Scoring method | Function used to rank generated poses. Vinardo (default) is recommended for most applications; Vina is available as an alternative. |
Each run produces a set of ranked poses as PDB files, displayed in a 3D interactive viewer. A spreadsheet summarises the key metrics for each pose.
| Column | Description |
|---|---|
Rank | Pose ranking by score (1 = best) |
Score | Vinardo or Vina binding score in kcal/mol; more negative indicates stronger predicted binding |
RMSD | Root-mean-square deviation from the top-ranked pose, useful for gauging pose diversity |
File | PDB file for the docked pose |
Molecular docking requires sampling a very large conformational space efficiently. Classical methods like Vina grid-search this space using empirical force fields; diffusion models like DiffDock operate in torsional angle space. SigmaDock takes a different route.
A ligand is broken at its rotatable bonds into rigid fragments. Each fragment maintains its fixed internal geometry (bond lengths, angles) throughout the docking process. The docking problem is then reframed as predicting an SE(3) transformation (translation + rotation in 3D space) for each fragment rather than a set of torsional angles.
The model uses Riemannian diffusion on the SE(3) manifold. During training, fragment poses are progressively randomized via Brownian motion on the translation and rotation spaces. The neural network learns to reverse this process — denoising noisy fragment configurations back into plausible bound poses. At inference, docking starts from randomized fragment poses and iteratively applies the learned denoising steps.
SigmaDock extends EquiformerV2, a graph transformer with SO(3)-equivariance, with a hierarchical graph topology using virtual nodes and edges to capture multi-scale protein-ligand interactions. The prediction head is SO(3)-equivariant to avoid local coordinate frame ambiguities. Soft geometric constraints via triangulation distance conditioning enforce chemical validity across fragment junctions throughout the diffusion trajectory.
SigmaDock intentionally uses a relatively small training set (~19,000 examples) but achieves AlphaFold3-comparable accuracy through well-designed inductive biases: equivariance, fragment rigidity, and geometric constraints. This contrasts with purely data-driven approaches that rely on much larger training corpora.
On the PoseBusters benchmark with a temporal train/test split:
| Method | Top-1 (RMSD <2 Å & PB-valid) |
|---|---|
| SigmaDock | ~80% |
| AlphaFold3 | comparable |
| DiffDock-L | 12–31% |
| AutoDock Vina | lower |
| GOLD | lower |
SigmaDock also maintains strong performance on proteins with low sequence similarity to its training set, a common failure mode for deep learning docking methods.