
Generate novel drug-like molecules in protein binding pockets using AI-powered structure-based design.
PocketFlow is a structure-based deep generative model that designs novel drug-like molecules inside protein binding pockets. Published in Nature Machine Intelligence in March 2024 by researchers at Sichuan University, it combines autoregressive flow modeling with explicit chemical knowledge to generate molecules with 100% chemical validity.
What sets PocketFlow apart is its experimental validation. The authors applied PocketFlow to design inhibitors for two epigenetic targets (HAT1 and YTHDC1) and successfully obtained wet-lab validated bioactive lead compounds. This makes PocketFlow the first structure-based molecular deep generative model with experimental validation of designed molecules.
In computational benchmarks on the CrossDocked2020 dataset, PocketFlow outperforms previous methods while maintaining perfect chemical validity and high drug-likeness scores.
PocketFlow generates molecules atom-by-atom within a protein binding pocket using an autoregressive approach. At each step, the model decides what atom type to add, where to place it in 3D space, and how to connect it to existing atoms. Chemical rules guide these decisions to ensure valid molecules.
The core of PocketFlow is the Geometric Double Bottleneck Perceptron (GDBP), an SE(3)-equivariant graph neural network that models the 3D geometry of the protein-ligand complex. GDBP improves upon earlier geometric neural networks (GVP and GBP) by adding bottleneck layers for both scalar and vector features.
This architecture processes 3D coordinates directly while maintaining equivariance to rotations and translations. The model can generate atom positions in 3D space without needing to first predict internal coordinates.
The autoregressive generation uses three specialized components working together:
Atom Flow predicts the type of each new atom (carbon, nitrogen, oxygen, etc.) using a normalizing flow. This probabilistic approach captures the distribution of atom types conditioned on the current molecular state and pocket environment.
Position Predictor determines where to place each new atom in 3D space relative to the binding pocket. The GDBP network encodes spatial relationships between existing atoms, protein residues, and potential placement sites.
Bond Flow predicts connectivity between the new atom and existing atoms using another normalizing flow. This component receives explicit chemical knowledge guidance to ensure reasonable bond patterns.
Unlike purely data-driven approaches, PocketFlow incorporates chemical knowledge directly into the generation process. The bond predictor checks whether proposed bonds satisfy valence rules and reasonable bonding patterns.
If the model proposes an unreasonable bond, it resamples until finding a valid connection. This explicit guidance is critical—ablation studies show that removing chemical constraints significantly degrades both validity and drug-likeness of generated molecules.
PocketFlow uses a two-stage training process. The model is first pretrained on the ZINC 3D database of drug-like molecules to learn general molecular patterns. It is then fine-tuned on CrossDocked2020, a dataset of protein-ligand complexes, to learn pocket-specific generation.
Provide your binding pocket structure in PDB format. You can upload a file, paste PDB content directly, or fetch from RCSB PDB.
The pocket should contain the protein residues surrounding the binding site where you want molecules generated. Typical pocket extractions include residues within 10Å of a reference ligand. We recommend using clean structures without waters or ions unless they're critical for binding.
These advanced settings control the stochastic generation process.
Temperature parameters affect sampling diversity. Lower temperatures produce more conservative, predictable structures while higher temperatures explore more unusual chemical space.
1.0 favor common atom types, above 1.0 increases diversity.Focus parameters control how the model selects which atom to extend next during generation.
PocketFlow ranks generated molecules by QED (Quantitative Estimate of Drug-likeness) and provides standard molecular properties.
QED ranges from 0-1, with higher values indicating more drug-like properties. The score combines molecular weight, lipophilicity, hydrogen bond donors/acceptors, polar surface area, rotatable bonds, and aromatic rings into a single metric.
QED > 0.7: Highly drug-likeQED 0.5-0.7: Moderate drug-likenessQED < 0.5: Less drug-like, may require optimizationDownload individual molecules as SDF files for further analysis in molecular modeling software. The 3D coordinates correspond to the predicted binding pose within the pocket.
Extract binding pockets with sufficient context (8-12Å from the binding site center). Too small pockets may constrain generation, while overly large pockets increase noise.
Remove crystallographic artifacts like buffer molecules, and consider whether to keep structural waters based on their role in binding.
Use PocketFlow as part of an iterative design workflow:
Based on: Jiang, Y., Zhang, G., You, J. et al. PocketFlow is a data-and-knowledge-driven structure-based molecular generative model. Nat Mach Intell 6, 326–337 (2024). https://doi.org/10.1038/s42256-024-00808-8