ProteinMPNN

Design amino acid sequences for protein backbones with fixed positions, amino acid biases, and sequence diversity controls.

25
Configure input settings on the left, then click "Submit"

Related tools

HyperMPNN

HyperMPNN

Design thermostable protein sequences using ProteinMPNN trained on hyperthermophilic organism structures. Generates sequences optimized for improved thermal stability without requiring ligands or additional context.

LigandMPNN

LigandMPNN

Design protein sequences with atomic context from ligands, metals, and nucleotides. Achieves 63.3% sequence recovery at binding sites, significantly outperforming ProteinMPNN (50.5%).

SolubleMPNN

SolubleMPNN

Specialized model for soluble protein sequence design. Trained exclusively on soluble proteins for optimized performance on cytoplasmic and extracellular proteins.

IgDesign

IgDesign

Design antibody CDR sequences via inverse folding. Generates complementarity-determining region (CDR) sequences for antibodies targeting therapeutic antigens using deep learning. Optimizes CDR loops (HCDR1, HCDR2, HCDR3) based on antibody-antigen complex structures.

AntiFold

AntiFold

Inverse folding for antibody variable domains and nanobodies. Predicts amino acid sequences compatible with antibody structures using IMGT numbering while preserving upstream AntiFold chain handling and structural constraints.

ESM-IF1

ESM-IF1

Inverse folding with ESM-IF1. Design protein sequences for given 3D backbone structures using a geometric deep learning model. Generate multiple sequence variants optimized for your target structure.

ProFam

ProFam

ProFam-1 is a protein family language model for family-conditioned sequence generation. Provide a protein family FASTA/MSA and generate new sequences with model likelihood scores for downstream ranking and screening.

PepMLM

PepMLM

Design linear peptide binders for target proteins using a target sequence-conditioned masked language model. PepMLM generates peptide sequences optimized to bind specific protein targets based on ESM-2 protein language modeling.

BindCraft

BindCraft

Design de novo protein binders using AlphaFold2 backpropagation, ProteinMPNN sequence optimization, and PyRosetta relaxation. BindCraft generates novel protein sequences that bind to user-specified target surfaces.

EvoPro

EvoPro

Optimize protein binders using genetic algorithms combined with AlphaFold2 fitness evaluation and ProteinMPNN sequence design. EvoPro evolves protein sequences to maximize binding affinity and structural quality through iterative cycles of mutation, selection, and validation.

What is ProteinMPNN?

ProteinMPNN solves the inverse folding problem: given a protein backbone structure, what amino acid sequences will fold into that shape? This reverses the structure prediction question—instead of asking what structure a sequence adopts, it asks what sequences can adopt a given structure.

Developed at the Institute for Protein Design and published in Science (2022), ProteinMPNN achieves 52.4% native sequence recovery on test backbones, compared to 32.9% for the previous state-of-the-art Rosetta design software. Beyond accuracy, it runs in ~1 second per protein versus ~4 minutes for Rosetta.

Experimental validation has been extensive. Crystal structures and cryo-EM reconstructions confirm that designed sequences fold to their intended structures. The method has successfully rescued previously failed designs and enabled new applications from nanomaterials to target-binding proteins.

How does ProteinMPNN work?

ProteinMPNN represents protein structures as graphs where residues are nodes and edges connect spatially proximate residues (the 32–48 nearest Cα neighbors). The neural network learns from this geometric representation without requiring evolutionary information or sequence alignments.

Encoding structure

The encoder (3 layers, 128 hidden dimensions) processes pairwise distances between backbone atoms: N, Cα, C, O, and a virtual Cβ. These interatomic distances capture inter-residue geometry more effectively than dihedral angles or coordinate frames. Message passing between nodes and edges propagates structural information throughout the graph.

Decoding sequences

Rather than generating amino acids sequentially from N- to C-terminus, ProteinMPNN uses order-agnostic autoregressive decoding. During training, the model learns to predict amino acids in random order. At inference, each position is decoded conditioned on the structural encoding and any previously decoded positions.

This flexibility enables practical design scenarios: fixing certain residues while redesigning others, enforcing identical sequences across homo-oligomer chains, or biasing toward specific amino acid compositions.

How to use ProteinMPNN online

ProteinMPNN runs on ProteinIQ's GPU infrastructure, delivering sequence designs in seconds without local installation.

Inputs

InputDescription
ProteinPDB file, .ent, .cif, or RCSB PDB ID (e.g., 1ABC). Structure must contain backbone coordinates.

Settings

Core settings

SettingDescription
Number of sequencesSequence variants to generate (1–48, default 8). More sequences explore broader sequence space at linear computational cost.
Sampling temperatureDiversity control (0.05–1.0, default 0.1). Lower = conservative, higher = diverse. See interpretation below.
Random seedInteger for reproducibility. Same seed + settings = identical output.

Temperature interpretation

TemperatureBehavior
0.05–0.1Conservative designs with highest predicted fitness. Best for maximizing sequence recovery.
0.2–0.3Moderate diversity while maintaining good recovery. Useful for variant libraries.
0.4–1.0High diversity at the cost of recovery. Use when exploring novel sequences matters more than optimality.

Design constraints

SettingDescription
Chains to designSpecify which chains to redesign (e.g., A,B); all others stay fixed. Simpler than listing every fixed residue for multi-chain proteins.
Homo-oligomerWhen enabled, all chains receive identical sequences. For symmetric assemblies like dimers or trimers.
Fixed positionsResidues to preserve unchanged. Format: A15,A19,A1-10,B1-20. Useful for catalytic or binding sites.
Redesigned positionsInverse of fixed—specify what to redesign, everything else stays fixed. Format is identical. Cannot be used with Fixed positions.
Parse chains onlyParse only specified chains from the PDB, ignoring all others. Useful for large multi-chain assemblies where only a subset is relevant.
Include zero-occupancy atomsInclude atoms with zero occupancy from crystal structures. Off by default.
Exclude amino acidsGlobally exclude specific amino acids from all designed positions. Enter one-letter codes without separators (e.g., C, CW).
Amino acid biasesPer-residue type bias from −25 (exclude entirely) to +5 (strongly favor). Controls amino acid composition without hard exclusions.

Results

Each designed sequence includes:

ColumnDescription
Sequence IDIdentifier for the design (seq_1, seq_2, …)
SequenceThe designed amino acid sequence
Overall confidenceModel confidence (0–1). Higher indicates the model is more certain the sequence will fold correctly.
Seq recoveryFraction of positions matching the original sequence
Mutation countNumber of positions that differ from the input
Identity %Percent identity to the input sequence

Results can be exported as FASTA, CSV, or JSON, with backbone PDB files available in the Files tab.

Applications

Inverse folding enables several protein engineering workflows:

  • De novo protein design: After generating a novel backbone with tools like RFdiffusion, ProteinMPNN provides sequences likely to fold into that structure
  • Sequence optimization: Generate variants of existing proteins with potentially improved expression, solubility, or stability
  • Functional homolog design: Create sequence-diverse proteins that maintain a target fold, useful when avoiding immune recognition or intellectual property constraints
  • Rescue failed designs: Re-sequence backbones from computationally designed proteins that failed to express or fold

Limitations

ProteinMPNN designs sequences based solely on backbone geometry. It does not consider:

  • Ligand interactions: For proteins with bound small molecules, metals, or nucleotides, use LigandMPNN instead
  • Membrane environment: Standard ProteinMPNN was trained on soluble proteins. For transmembrane proteins or optimizing soluble expression, consider SolubleMPNN
  • Stability optimization: While designs often fold well, ProteinMPNN does not explicitly optimize thermostability. Consider ThermoMPNN for stability predictions

Experimental validation remains essential—computational metrics predict but do not guarantee foldability or function.