ProteinIQ

LigandMPNN

Design protein sequences considering ligands, metals, and nucleotides. Perfect for enzyme engineering and binding site optimization with superior accuracy at interaction interfaces.

What is LigandMPNN?

LigandMPNN is an inverse folding model that designs protein sequences while accounting for non-protein atoms—ligands, metals, nucleotides, and cofactors. Standard inverse folding methods like ProteinMPNN only see the protein backbone, which limits their accuracy at binding sites where interactions with small molecules matter most. LigandMPNN solves this by incorporating atomic context from heteroatoms during sequence prediction.

The improvement is substantial. At small molecule binding sites, LigandMPNN achieves 63.3% sequence recovery compared to 50.5% for ProteinMPNN. Metal coordination sites show an even larger gap: 77.5% versus 36.0%. These gains translate to better experimental success rates—over 100 designs have been validated, with some showing 100-fold affinity improvements over conventional methods.

Developed by Dauparas, Lee, and colleagues at the Institute for Protein Design, LigandMPNN was published in Nature Methods (2025).

How does LigandMPNN work?

LigandMPNN extends ProteinMPNN's graph neural network by processing three interconnected graphs instead of one: a protein-only graph (residues as nodes), a ligand-only graph (heteroatoms as nodes), and a protein-ligand graph that captures residue-atom interactions. This architecture allows the model to learn chemical element identities and geometric constraints at binding interfaces.

The model consists of a protein backbone encoder (3 layers), a protein-ligand encoder (2 layers), and an autoregressive decoder that generates both amino acid sequences and sidechain torsion angles. Training used 8–16 context atoms per residue and added 0.1 Å Gaussian noise to coordinates, improving generalization to novel structures.

How to use LigandMPNN online

ProteinIQ hosts LigandMPNN on cloud GPU infrastructure, eliminating the need to install Python dependencies or configure the model locally.

Inputs

InputDescription
ProteinPDB file (up to 50 MB) or RCSB PDB ID. Must contain ATOM records for the protein and HETATM records for ligands/metals.
LigandSMILES string, SDF/MOL file, or PubChem CID. Required for specifying the binding context.

Core settings

SettingDescription
Number of sequencesHow many sequence variants to generate (1–48, default 8). More sequences = better coverage of sequence space but longer runtime. Start with 8–10 for initial testing; use 20–40 for comprehensive exploration.
Sampling temperatureControls diversity (0.05–1.0, default 0.1). Lower values produce conservative designs with high recovery; higher values explore diverse solutions. For metal sites, stay at 0.1–0.15.
Use atom contextInclude ligands, metals, and nucleotides during design. Keep enabled for binding site work—this is LigandMPNN's core advantage.
Use side chain contextInclude fixed residue sidechains as geometric constraints. Enable when preserving catalytic or binding residues.
Random seedFor reproducible results (0–99999, default 111).

Design options

SettingDescription
Homo-oligomerEnable symmetric design for proteins with identical chains. All chains receive the same sequence.
Fixed positionsResidues to keep unchanged. Format: A15, A1-10, B1-20, or C for entire chain. Comma-separate multiple entries.
Redesigned positionsInverse of fixed—specify what to design, fix everything else. Cannot be used with fixed positions.
Amino acid biasesAdjust sampling frequency per amino acid. Positive values (0 to +2) increase likelihood; negative values decrease it. Set to −25 to exclude an amino acid entirely.

Output

Each designed sequence includes:

ColumnDescription
Sequence IDUnique identifier for the design
SequenceDesigned amino acid sequence
LengthNumber of residues
Overall ConfidenceModel confidence score (0–1). Higher indicates better predicted binding.
Seq RecoveryPercent identity to the input sequence

Results can be exported as FASTA, CSV, or JSON.

LigandMPNN vs ProteinMPNN

FeatureProteinMPNNLigandMPNN
Input contextProtein backbone onlyProtein + ligands, metals, nucleotides
Binding site recovery50.5%63.3%
Metal coordination recovery36.0%77.5%
Nucleotide binding recovery35.2%50.5%
Model size1.66M parameters2.62M parameters
ArchitectureSingle protein graphThree-graph (protein, ligand, protein-ligand)
Sidechain predictionNoYes (torsion angles)
SpeedFaster~2× slower
Credit cost2550

When to use each

Use ProteinMPNN for general protein design where no ligands or cofactors are involved—de novo protein scaffolds, antibody frameworks away from CDRs, or soluble protein cores.

Use LigandMPNN whenever the design involves binding sites: enzyme active sites, cofactor-binding pockets (NAD, FAD, heme), metal coordination spheres, or nucleic acid interfaces. The sequence recovery improvements at these sites translate directly to higher experimental success rates.

Limitations

  • Requires heteroatoms in the PDB file for optimal performance. Structures without HETATM records default to ProteinMPNN-like behavior.
  • Computational cost is roughly double that of ProteinMPNN due to the additional encoder layers.
  • Designed sequences require experimental validation—high confidence scores improve success rates but do not guarantee function.
  • ProteinMPNN: Backbone-only inverse folding for general protein design
  • SolubleMPNN: ProteinMPNN variant optimized for soluble expression
  • ESM-IF1: Language model approach to inverse folding
  • ThermoMPNN: Design thermostable proteins
  • ESMFold: Predict structures from designed sequences