Input
Output
De novo generation
Linker design

PocketFlow is a structure-based molecular generative model that designs novel drug-like molecules within protein binding pockets. It uses autoregressive flow modeling with chemical knowledge to generate 100% chemically valid, highly drug-like compounds.

PocketXMol is a pocket-interacting generative foundation model for docking, small-molecule design, and peptide design in protein binding pockets.

BoltzGen is a state-of-the-art AI model for designing protein and peptide binders against any biomolecular target. Using generative diffusion models, it creates novel binders (proteins, peptides, nanobodies) with nanomolar-level binding affinity.

EvoDiff is a diffusion-based protein sequence generation framework from Microsoft Research. ProteinIQ currently wraps the EvoDiff-Seq OA_DM_38M model for unconditional protein generation, motif scaffolding, and user-sequence inpainting.

All-atom generative AI for designing protein binders. Specify target binding sites and generate diverse binding proteins with fine-grained control over interaction parameters.

Reasoning-guided antibody CDR co-design for antibody-antigen complexes. Proteo-R1 identifies residue-level functional decisions and uses conditional diffusion to generate ranked designed structures with confidence metrics.

RFdiffusion is a state-of-the-art protein structure generation tool that uses diffusion models to design proteins de novo, create binders, scaffold motifs, and generate symmetric oligomers with atomic precision.

RFdiffusion2 is an atom-level enzyme active site scaffolding tool that generates protein scaffolds around your input motif. REQUIRES an input PDB structure containing the active site residues to scaffold. For ligand-aware design, ligands must be embedded in the input PDB as HETATM records.

Design linear peptide binders for target proteins using a target sequence-conditioned masked language model. PepMLM generates peptide sequences optimized to bind specific protein targets based on ESM-2 protein language modeling.

ProGen2 is Salesforce Research's protein language model suite for prompt-based de novo protein sequence generation. It samples novel amino acid sequences from a plain-text context string using top-p sampling and temperature control.
GenMol is NVIDIA's discrete diffusion model for generating drug-like small molecules. It is designed for the common medicinal chemistry cases where the desired output is not one molecule, but a set of plausible analogs to rank, filter, dock, and refine.
The model works with SAFE, a fragment-based molecular representation. Instead of treating a molecule as a left-to-right SMILES string, SAFE represents molecules as fragment blocks with attachment points. GenMol can therefore generate a molecule from scratch, connect two fragments with a linker, grow from a motif, decorate a scaffold, or build a larger superstructure from an existing fragment using the same model family.
GenMol is useful early in discovery, before expensive structure-based or experimental filtering. Its outputs are candidate molecular graphs, not proof of binding, potency, selectivity, or synthesizability.
Run GenMol online by choosing de novo generation or a fragment-constrained task, setting the number of molecules and sampling parameters, then submitting the job. ProteinIQ returns generated SMILES, ranked molecule properties, and downloadable SDF files for follow-up analysis, filtering, visualization, or docking.
| Input | Required | Description |
|---|---|---|
Fragment | No for De novo; yes for Fragment-constrained | SMILES or SMI text with * attachment points. Linker design uses two fragments separated by .. |
Job name | No | Optional label for identifying the run in job history. |
| Task | Example input | What it asks GenMol to do |
|---|---|---|
Linker design (one-step) | CC(*)c1ccccc1.*c1ccc(F)cc1 | Connect two fragments with a newly generated linker. |
Motif extension | c1ccc(*)cc1 | Grow substituents from a known motif. |
Scaffold decoration | c1cc(*)cc(*)c1 | Add groups at multiple scaffold attachment sites. |
Superstructure generation | c1ccc(*)cc1 | Generate larger molecules that retain the supplied fragment context. |
Attachment points matter. A chemically unreasonable * position can force GenMol into poor local chemistry even when the parent fragment looks drug-like. For fragment work, prepare the fragment at the bond where synthetic elaboration or medicinal chemistry exploration is intended.
GenMol fragment-constrained mode does not accept an ordinary complete SMILES as a starting molecule. The input must be a fragment SMILES with one or more * attachment markers that tell GenMol where new chemistry can be generated.
| Requirement | Valid pattern | Invalid pattern | Why it matters |
|---|---|---|---|
De novo mode has no fragment input | Empty fragment field | CCO in the fragment field | De novo generation starts from the learned chemical distribution, not from a supplied molecule. |
Motif extension has one growth point | c1ccc(*)cc1 | c1ccccc1 | The * marks the atom where GenMol should grow new chemistry. |
Scaffold decoration can mark several positions | c1cc(*)cc(*)c1 | c1cc(*)cc(*)c1.* | Multiple * markers belong on the same molecular scaffold, not as a separate bare fragment. |
Linker design has two molecular fragments | CC(*)c1ccccc1.*c1ccc(F)cc1 | CC(*)c1ccccc1.* | A standalone * is only an attachment marker. It is not a second fragment for GenMol to connect. |
| Each linker side has one attachment point | warhead*.*recruiter | warhead*.**recruiter* | Linker design expects one open bond on each side of the linker. Extra markers make the connection ambiguous. |
PROTAC linker terminology
Warhead and recruiter are standard PROTAC linker-design terms. The warhead binds the protein of interest, the recruiter binds the E3 ligase, and GenMol designs chemistry between their attachment points. For non-PROTAC linker design, read the same pattern as fragmentA*.*fragmentB: two molecular fragments, one * attachment point on each side.
GenMol is trained for drug-like organic small molecules. Fragment inputs should use ordinary medicinal-chemistry atoms such as C, N, O, S, P, F, Cl, Br, and I. Inorganic cages, organosilicon scaffolds such as [Si], salts, disconnected non-fragment species, and highly unusual query-style structures are outside the intended input space and may fail before molecule generation begins.
In practical terms, a good fragment input is a chemically meaningful partial ligand with one clear open attachment point:
c1ccc(*)cc1: phenyl motif with one growth point.CC(*)c1ccccc1.*c1ccc(F)cc1: two ligand fragments prepared for linker design.c1cc(*)cc(*)c1: scaffold with two decoration positions.These are not equivalent:
CCO: a complete molecule with no generation site.*: an attachment marker without a molecular fragment.CC(*)c1ccccc1.*: one valid fragment plus one missing fragment.*[H][Si](C)(C)O*: an organosilicon fragment outside GenMol's drug-like organic fragment scope.| Setting | Range or values | Default | Description |
|---|---|---|---|
Generation mode | De novo, Fragment-constrained | De novo | De novo generates molecules without a starting structure. Fragment-constrained conditions generation on the supplied fragment input. |
Fragment task | Linker design (one-step), Linker design (two-step), Motif extension, Scaffold decoration, Superstructure generation | Linker design (one-step) | Selects the fragment-constrained generation behavior. This setting only affects Fragment-constrained runs. |
Number of molecules | 10 to 200 | 50 | Number of molecules to generate. Larger runs sample more chemical diversity and take longer. |
Softmax temperature | 0.5 to 2.0 | 1.0 | Controls token-level diversity during sampling. Lower values are more conservative; higher values produce broader chemistry with more risk of low-quality outputs. |
Randomness | 0.1 to 5.0 | 0.3 | Controls stochasticity during generation. GenMol V2 de novo examples commonly use 0.3; fragment tasks often need higher values such as 1.0 to 3.0. |
Molecular context guidance (gamma) | 0.0 to 1.0 | 0.3 | Controls how strongly fragment-constrained generation follows the supplied context. Higher values keep closer to the input fragment; lower values allow more exploration. |
| Goal | Best mode | Practical note |
|---|---|---|
| Explore broad ligand-like chemical space | De novo | Start with 50 molecules, then increase to 100 or 200 if the first run has promising property ranges. |
| Design a PROTAC or bivalent compound linker | Linker design (one-step) | Use two fragments with one attachment point each. Follow with length, polarity, and flexibility filtering. |
| Build a more conservative linker | Linker design (two-step) | Useful when one-step sampling produces linkers that look too abrupt or chemically strained. |
| Elaborate a known hit fragment | Motif extension | Best for one attachment point where the fragment should remain recognizable. |
| Explore SAR around a core | Scaffold decoration | Mark every intended substitution position with *; avoid marking positions that should remain fixed. |
| Search around a fragment in a larger chemical context | Superstructure generation | More exploratory than simple motif extension. Expect more diversity and a wider property spread. |
GenMol returns a ranked table of generated molecules, SMILES strings, molecular property columns, and SDF files. The viewer tab prioritizes generated molecules by drug-likeness, but the right candidate depends on the next assay or computational filter.
| Result | Description |
|---|---|
Rank | ProteinIQ ranking of generated molecules, primarily useful for scanning the output table. |
QED | Quantitative Estimate of Drug-likeness, from 0 to 1. Higher values usually indicate a property profile closer to known oral drugs. |
SA | Synthetic accessibility score. Lower values are generally easier to synthesize. |
MW (Da) | Molecular weight in Daltons. Many oral small molecules sit below roughly 500 Da, although PROTACs and macrocycles often exceed that range. |
LogP | Predicted octanol-water partition coefficient. Higher values usually mean more hydrophobic molecules. |
SMILES | Canonical SMILES for downstream filtering, property prediction, or docking preparation. |
Download | Downloadable SDF for each generated molecule. |
QED should not be used as a single pass or fail decision. It rewards molecular property profiles similar to approved drugs, which is useful for triage, but it does not know the target, binding pocket, assay format, liabilities, or project-specific design constraints.
| Pattern | Interpretation |
|---|---|
High QED, low SA | Often the easiest candidates to inspect first. These molecules combine drug-like property balance with comparatively accessible chemistry. |
High QED, high SA | Attractive on paper but potentially difficult to make. Check ring systems, stereochemistry, and unusual functional groups before prioritizing. |
Low QED, low SA | Chemically accessible molecules that may need property optimization. These can still be useful for fragment or tool-compound exploration. |
Low QED, high SA | Usually lower priority unless the structure fits a specific project hypothesis. |
The quality summary used in GenMol examples combines QED of at least 0.6 with SA no greater than 4. That is a useful first filter for many drug-like campaigns, not a universal medicinal chemistry rule.
| Metric | Meaning |
|---|---|
Validity | Fraction of requested molecules that were valid molecular structures. |
Uniqueness | Fraction of valid generated SMILES that are non-duplicates. |
Diversity | Average pairwise fingerprint distance across unique molecules. Higher values indicate a broader chemical set. |
Quality | Fraction of requested molecules meeting the run's quality criteria, commonly based on QED and SA. |
High diversity is useful for hit discovery. Lower diversity can be acceptable for scaffold decoration or linker design when the goal is a focused analog series around a fixed starting point.
The SDF files are convenient for visualization and downstream tools, but they should not be interpreted as binding poses. The coordinates are generated molecular conformers, not docked orientations in a protein pocket. For target-specific pose assessment, use DiffDock, AutoDock Vina, or GNINA after preparing a receptor structure.
GenMol combines masked discrete diffusion with a transformer model trained on SAFE molecular strings. The diffusion process starts from masked molecular tokens and iteratively predicts replacements, refining a full molecule over repeated denoising steps.
This differs from autoregressive SMILES generation. Autoregressive models generate one token after another, so every token depends heavily on the order chosen so far. GenMol predicts masked regions with bidirectional context, which lets the model use information from both sides of a fragment arrangement and sample multiple positions in parallel.
SAFE stands for Sequential Attachment-based Fragment Embedding. It decomposes molecules into fragment blocks, often using BRICS-style chemistry rules, and records how fragments attach. A linker design input such as CC(*)c1ccccc1.*c1ccc(F)cc1 contains two fragment contexts and attachment markers for the region that should be generated.
Fragment order is less important in SAFE than in ordinary SMILES. That property fits the way GenMol uses bidirectional context: the model can reason over the fragment set instead of only the next token in a string.
GenMol's exploration strategy can mask whole fragments rather than only individual tokens. During generation, uncertain or replaceable fragments can be masked again and regenerated in the context of the surrounding molecule. This makes the search operate at a chemically meaningful level, closer to replacing a substituent or linker block than editing a single atom at a time.
Molecular context guidance, controlled by Molecular context guidance (gamma), increases the influence of supplied fragments during constrained generation. Higher gamma values usually produce molecules that preserve the intended fragment context more strongly. Lower values give the model more room to depart from the input and may increase novelty.
De novo generation with 50 molecules for a first pass.100 or 200 molecules if the first run returns valid, diverse chemistry.QED, SA, MW (Da), and LogP.*.Motif extension for one growth point or Scaffold decoration for several substitution positions.Softmax temperature near 1.0 to 1.2 for conservative analog generation.Randomness only when repeated runs return near-duplicates or overfit to the starting fragment.warhead*.*recruiter.Linker design (one-step) for broad exploration.Linker design (two-step) if the first run produces linkers that are too strained, too hydrophobic, or too similar.GenMol is a ligand generation tool. It proposes candidate molecules or analogs, then other tools test whether those candidates satisfy project constraints.
| Need | Better starting point |
|---|---|
| Generate new ligand-like molecules from scratch | GenMol De novo |
| Grow or decorate a known fragment | GenMol Motif extension or Scaffold decoration |
| Connect two ligand fragments | GenMol Linker design |
| Generate ligands inside a known protein pocket | PocketFlow or PocketxMol |
| Predict binding poses for generated molecules | DiffDock, AutoDock Vina, or GNINA |
| Convert generated SMILES into structure files | SMILES to SDF, SMILES to PDB, or SMILES to MOL2 |
| Filter property and liability risks | ADMET-AI, Lipinski's Rule of 5, PAINS filter, or Brenk filter |
GenMol generates molecular candidates, not validated drugs. A high-ranked molecule still needs target-aware modeling, liability screening, retrosynthetic review, and experimental validation.
Fragment-constrained runs depend strongly on input quality. Poor attachment points, unstable fragments, salts, disconnected chemistry outside the intended format, or very unusual motifs can reduce validity and usefulness.
The SDF output is not a substitute for docking or conformational analysis. It is a convenient structure file for downstream work, while binding geometry must be assessed in a receptor context.