LigandMPNN

Design protein sequences around ligands, metals, and nucleotides for enzyme engineering and binding-site optimization.

50
Configure input settings on the left, then click "Submit"

Related tools

HyperMPNN

HyperMPNN

Design thermostable protein sequences using ProteinMPNN trained on hyperthermophilic organism structures. Generates sequences optimized for improved thermal stability without requiring ligands or additional context.

ProteinMPNN

ProteinMPNN

Design protein sequences for given backbone structures using deep learning. Fast and accurate inverse folding with state-of-the-art sequence recovery (52.4%).

SolubleMPNN

SolubleMPNN

Specialized model for soluble protein sequence design. Trained exclusively on soluble proteins for optimized performance on cytoplasmic and extracellular proteins.

IgDesign

IgDesign

Design antibody CDR sequences via inverse folding. Generates complementarity-determining region (CDR) sequences for antibodies targeting therapeutic antigens using deep learning. Optimizes CDR loops (HCDR1, HCDR2, HCDR3) based on antibody-antigen complex structures.

AntiFold

AntiFold

Inverse folding for antibody variable domains and nanobodies. Predicts amino acid sequences compatible with antibody structures using IMGT numbering while preserving upstream AntiFold chain handling and structural constraints.

ESM-IF1

ESM-IF1

Inverse folding with ESM-IF1. Design protein sequences for given 3D backbone structures using a geometric deep learning model. Generate multiple sequence variants optimized for your target structure.

ProFam

ProFam

ProFam-1 is a protein family language model for family-conditioned sequence generation. Provide a protein family FASTA/MSA and generate new sequences with model likelihood scores for downstream ranking and screening.

RFdiffusion 2

RFdiffusion 2

RFdiffusion2 is an atom-level enzyme active site scaffolding tool that generates protein scaffolds around your input motif. REQUIRES an input PDB structure containing the active site residues to scaffold. For ligand-aware design, ligands must be embedded in the input PDB as HETATM records.

RFdiffusion3

RFdiffusion3

All-atom generative diffusion model for protein design with complex constraints. Design binders, enzymes, and symmetric protein assemblies.

PepMLM

PepMLM

Design linear peptide binders for target proteins using a target sequence-conditioned masked language model. PepMLM generates peptide sequences optimized to bind specific protein targets based on ESM-2 protein language modeling.

What is LigandMPNN?

LigandMPNN is an inverse folding model that designs protein sequences while accounting for non-protein atoms—ligands, metals, nucleotides, and cofactors. Standard inverse folding methods like ProteinMPNN only see the protein backbone, which limits their accuracy at binding sites where interactions with small molecules matter most. LigandMPNN solves this by incorporating atomic context from heteroatoms during sequence prediction.

The improvement is substantial. At small molecule binding sites, LigandMPNN achieves 63.3% sequence recovery compared to 50.5% for ProteinMPNN. Metal coordination sites show an even larger gap: 77.5% versus 36.0%. These gains translate to better experimental success rates—over 100 designs have been validated, with some showing 100-fold affinity improvements over conventional methods.

Developed by Dauparas, Lee, and colleagues at the Institute for Protein Design, LigandMPNN was published in Nature Methods (2025).

How does LigandMPNN work?

LigandMPNN extends ProteinMPNN's graph neural network by processing three interconnected graphs instead of one: a protein-only graph (residues as nodes), a ligand-only graph (heteroatoms as nodes), and a protein-ligand graph that captures residue-atom interactions. This architecture allows the model to learn chemical element identities and geometric constraints at binding interfaces.

The model consists of a protein backbone encoder (3 layers), a protein-ligand encoder (2 layers), and an autoregressive decoder that generates both amino acid sequences and sidechain torsion angles. Training used 8–16 context atoms per residue and added 0.1 Å Gaussian noise to coordinates, improving generalization to novel structures.

How to use LigandMPNN online

ProteinIQ hosts LigandMPNN on cloud GPU infrastructure, eliminating the need to install Python dependencies or configure the model locally.

Inputs

InputDescription
ProteinPDB file (up to 50 MB), .ent, .cif, or RCSB PDB ID. Ligand atoms are read from HETATM records already present in the structure—no separate ligand file is needed for structures fetched from RCSB.
LigandOptional. SMILES string, SDF/MOL file, or PubChem CID for reference. LigandMPNN reads atomic context directly from HETATM records in the PDB; the ligand input is informational context only.

Core settings

SettingDescription
Number of sequencesHow many sequence variants to generate (1–48, default 8). More sequences give better coverage of sequence space but increase runtime. Start with 8–10 for initial testing; use 20–40 for comprehensive exploration.
Sampling temperatureControls diversity (0.05–1.0, default 0.1). Lower values produce conservative designs with high recovery; higher values explore more diverse solutions. For metal sites, 0.1–0.15 is recommended.
Use atom contextInclude ligands, metals, and nucleotides during design (default on). Disabling this makes LigandMPNN behave like ProteinMPNN—only useful as a control.
Use side chain contextInclude fixed residue sidechains as geometric constraints. Enable when catalytic or binding residues are fixed and their geometry should influence neighboring positions.
Random seedInteger for reproducible results (0–99999, default 111). Same seed and settings produce identical output.

Design options

SettingDescription
Chains to designSpecify which chains to redesign (e.g., A,B); all others stay fixed. Simpler than listing every fixed residue for multi-chain proteins.
Homo-oligomerEnable symmetric design for proteins with identical chains. All chains receive the same sequence.
Fixed positionsResidues to keep unchanged. Format: A15, A1-10, B1-20, or C for an entire chain. Comma-separate multiple entries.
Redesigned positionsInverse of fixed positions—specify what to design and fix everything else. Cannot be used simultaneously with Fixed positions.
Parse chains onlyParse only specified chains from the PDB, ignoring all others. Useful for large assemblies where only a subset of chains is relevant.
Scoring cutoff (Å)Distance cutoff for selecting residues scored with ligand context (default 8.0 Å). Decrease to focus design on residues very close to the ligand; increase for broader context.
Include zero-occupancy atomsInclude atoms with zero occupancy from crystal structures, which indicate partially occupied or disordered positions. Off by default.
Exclude amino acidsGlobally exclude one or more amino acids from all designed positions. Enter one-letter codes without separators (e.g., CW to exclude cysteine and tryptophan).
Amino acid biasesAdjust sampling frequency per amino acid. Positive values (0 to +5) increase likelihood; negative values (down to −25) decrease it. Setting a value of −25 effectively excludes that amino acid entirely.

Output

Each designed sequence includes:

ColumnDescription
Sequence IDUnique identifier for the design (seq_1, seq_2, …)
SequenceDesigned amino acid sequence, with chains separated by / in FASTA output
LengthTotal residue count across all designed chains
Overall confidenceModel confidence (0–1). Computed as exp[mean(logpi)]\exp[-\text{mean}(\log p_i)] over residues. Higher values indicate stronger sequence-structure compatibility.
Ligand confidenceConfidence specifically at residues within the scoring cutoff distance of ligand atoms. Distinct from overall confidence; provides a focused view of binding site quality.
Seq recoveryFraction of positions matching the input sequence
Mutation countNumber of positions that differ from the input
Identity %Percent identity to the input sequence

Results are available as FASTA, CSV, JSON, and backbone PDB files.

LigandMPNN vs ProteinMPNN

FeatureProteinMPNNLigandMPNN
Input contextProtein backbone onlyProtein + ligands, metals, nucleotides
Binding site recovery50.5%63.3%
Metal coordination recovery36.0%77.5%
Nucleotide binding recovery35.2%50.5%
Model size1.66M parameters2.62M parameters
ArchitectureSingle protein graphThree-graph (protein, ligand, protein-ligand)
Sidechain predictionNoYes (torsion angles)
SpeedFaster~2× slower
Credit cost2550

When to use each

Use ProteinMPNN for general protein design where no ligands or cofactors are involved—de novo protein scaffolds, antibody frameworks away from CDRs, or soluble protein cores.

Use LigandMPNN whenever the design involves binding sites: enzyme active sites, cofactor-binding pockets (NAD, FAD, heme), metal coordination spheres, or nucleic acid interfaces. The sequence recovery improvements at these sites translate directly to higher experimental success rates.

Limitations

  • Ligand context comes from HETATM records already present in the PDB. Structures without these records default to ProteinMPNN-like behavior regardless of whether a separate ligand is provided.
  • Computational cost is roughly double that of ProteinMPNN due to the additional encoder layers.
  • Designed sequences require experimental validation—high confidence scores improve success rates but do not guarantee function.