New inverse folding suite: ProteinMPNN, LigandMPNN, and SolubleMPNN

Dr. Matic Broz4 min readNovember 8, 2025

New inverse folding suite: ProteinMPNN, LigandMPNN, and SolubleMPNN

We've integrated the complete MPNN family—ProteinMPNN, LigandMPNN, and SolubleMPNN—giving you state-of-the-art protein sequence design for every application.

Inverse folding reverses the central problem of structural biology. Instead of asking "what structure will this sequence fold into?", it asks "what sequences will fold into this structure?" This enables rational protein engineering: designing novel enzymes, stabilizing therapeutic proteins, creating binding sites for small molecules, and optimizing sequences for expression—all grounded in 3D structural context.

Running these models traditionally requires managing conda environments, CUDA dependencies, GPU access, and complex preprocessing pipelines. ProteinIQ makes them accessible—upload a protein structure, configure your design parameters, and get ranked sequence candidates in roughly a minute.

Three models, one complete toolkit

We've implemented all three variants from the Institute for Protein Design because each solves a distinct problem:

ProteinMPNN is the foundation. It achieves 52.4% sequence recovery on native backbones versus 32.9% for Rosetta, the previous standard. Use this for general protein design when your structure contains only protein atoms. It's fast (5-minute timeout), cost-effective (25 credits), and excels at designing sequences for protein-only backbones. Published in Science (2022) with thousands of citations, it's the most validated inverse folding model in the field.

LigandMPNN extends the architecture to handle non-protein atoms—small molecules, cofactors, metals, nucleotides. It achieves 63.3% sequence recovery at ligand binding sites (versus 50.5% for ProteinMPNN) and 77.5% for metal coordination (versus 36.0% for alternatives). This isn't incremental improvement—it's a fundamental capability shift. The model explicitly processes heteroatom context through a three-graph neural network architecture that learns residue-ligand interactions. Use this for enzyme design, metalloprotein engineering, or any system where non-protein atoms define the binding chemistry. Published in Nature Methods (2025), with over 100 experimentally validated designs confirming high-affinity binding.

SolubleMPNN specializes in cytoplasmic and extracellular proteins. Trained exclusively on soluble protein structures (membrane proteins excluded), it's optimized for antibody engineering, secreted proteins, and expression optimization in E. coli or mammalian systems. Same architecture as ProteinMPNN but with training data filtered for soluble-specific patterns.

Performance and infrastructure

Our implementation runs on NVIDIA A100 GPUs with optimized inference pipelines. Typical jobs complete in 30-90 seconds. We handle all preprocessing automatically—you provide raw PDB files or RCSB IDs, and we prepare structures, assign atom types, and configure search parameters.

All three models support advanced design controls:

1-48 sequence candidates per backbone with configurable sampling temperature
Homo-oligomer mode for symmetric complexes
Fixed/redesigned position specification for preserving catalytic residues or binding sites
Per-residue amino acid biases for controlling composition
Reproducible seeding for experimental validation

When to use each model

Use ProteinMPNN for de novo backbone design or stabilizing existing folds. Use LigandMPNN when your structure includes cofactors, metals, or small molecules that define binding chemistry. Use SolubleMPNN for antibodies, secreted therapeutics, or improving expression in recombinant systems.

Getting started

All three models are available now on ProteinIQ:

ProteinMPNN
LigandMPNN
SolubleMPNN For typical usage:

Upload a PDB file or provide an RCSB PDB ID
(LigandMPNN only) Add your ligand as SMILES or SDF
Configure sequences (start with 8-10), temperature (0.1 for conservative, 0.2-0.3 for diversity)
Optionally specify fixed positions or amino acid biases
Submit and review ranked sequences with confidence scores in ~1 minute Best practices:

Start with 8 sequences at temperature 0.1 for initial screening
Increase to 20-40 sequences at temperature 0.2-0.3 for comprehensive libraries
Use fixed positions to preserve catalytic residues or structural motifs
For metal binding sites in LigandMPNN, stay conservative with temperature 0.1-0.15
Check confidence scores—higher values indicate better predicted stability Each output includes FASTA, CSV, and JSON export options. Sequences integrate directly into synthesis workflows or downstream validation pipelines.

What this enables

Accessible inverse folding means:

Rapid protein engineering: Design thermostable variants or novel enzymes without manual sequence alignment
Binding site optimization: Engineer high-affinity interactions with small molecules or cofactors
Expression improvement: Redesign surface residues for better solubility and yield
Educational use: Students learn computational protein design without GPU cluster access
Collaborative research: Share reproducible design parameters across teams

Further information

ProteinMPNN was developed by Justas Dauparas and colleagues at the Institute for Protein Design. LigandMPNN extends this work to heteroatom-conditioned design. SolubleMPNN provides specialized training for soluble protein applications.

ProteinMPNN: Dauparas et al. (2022). Science. DOI: 10.1126/science.add2187
LigandMPNN: Dauparas et al. (2025). Nature Methods. DOI: 10.1038/s41592-025-02626-1
GitHub: dauparas/ProteinMPNN, dauparas/LigandMPNN For technical details, see individual tool pages: ProteinMPNN, LigandMPNN, SolubleMPNN. For questions about infrastructure, pricing, or custom deployments, contact our team.

Every researcher deserves access to the tools that accelerate discovery. We're here to make that happen.