ProteinMPNN

Design amino acid sequences for protein backbone structures using state-of-the-art deep learning. ProteinMPNN achieves 52.4% sequence recovery and enables rational protein engineering.

Input

Job name

Protein

Click or drag files to upload (.pdb, .ent)

25 credits

Output

Configure input settings, then click "Submit"

What is ProteinMPNN?

ProteinMPNN solves the inverse folding problem: given a protein backbone structure, what amino acid sequences will fold into that shape? This reverses the structure prediction question—instead of asking what structure a sequence adopts, it asks what sequences can adopt a given structure.

Developed at the Institute for Protein Design and published in Science (2022), ProteinMPNN achieves 52.4% native sequence recovery on test backbones, compared to 32.9% for the previous state-of-the-art Rosetta design software. Beyond accuracy, it runs in ~1 second per protein versus ~4 minutes for Rosetta.

Experimental validation has been extensive. Crystal structures and cryo-EM reconstructions confirm that designed sequences fold to their intended structures. The method has successfully rescued previously failed designs and enabled new applications from nanomaterials to target-binding proteins.

How does ProteinMPNN work?

ProteinMPNN represents protein structures as graphs where residues are nodes and edges connect spatially proximate residues (the 32–48 nearest Cα neighbors). The neural network learns from this geometric representation without requiring evolutionary information or sequence alignments.

Encoding structure

The encoder (3 layers, 128 hidden dimensions) processes pairwise distances between backbone atoms: N, Cα, C, O, and a virtual Cβ. These interatomic distances capture inter-residue geometry more effectively than dihedral angles or coordinate frames. Message passing between nodes and edges propagates structural information throughout the graph.

Decoding sequences

Rather than generating amino acids sequentially from N- to C-terminus, ProteinMPNN uses order-agnostic autoregressive decoding. During training, the model learns to predict amino acids in random order. At inference, each position is decoded conditioned on the structural encoding and any previously decoded positions.

This flexibility enables practical design scenarios: fixing certain residues while redesigning others, enforcing identical sequences across homo-oligomer chains, or biasing toward specific amino acid compositions.

How to use ProteinMPNN online

ProteinMPNN runs on ProteinIQ's GPU infrastructure, delivering sequence designs in seconds without local installation.

Inputs

Input	Description
`Protein`	PDB file or RCSB PDB ID (e.g., `1ABC`). Structure must contain backbone coordinates.

Settings

Core settings

Setting	Description
`Number of sequences`	Sequence variants to generate (1–48, default 8). More sequences explore broader sequence space at linear computational cost.
`Sampling temperature`	Diversity control (0.05–1.0, default 0.1). Lower = conservative, higher = diverse. See interpretation below.
`Random seed`	Integer for reproducibility. Same seed + settings = identical output.

Temperature interpretation

Temperature	Behavior
0.05–0.1	Conservative designs with highest predicted fitness. Best for maximizing sequence recovery.
0.2–0.3	Moderate diversity while maintaining good recovery. Useful for variant libraries.
0.4–1.0	High diversity at the cost of recovery. Use when exploring novel sequences matters more than optimality.

Design constraints

Setting	Description
`Homo-oligomer`	When enabled, all chains receive identical sequences. For symmetric assemblies like dimers or trimers.
`Fixed positions`	Residues to preserve unchanged. Format: `A15,A19,A1-10,B1-20`. Useful for catalytic or binding sites.
`Redesigned positions`	Inverse of fixed—specify what to redesign, everything else stays fixed. Format is identical.
`Amino acid biases`	Per-residue type bias from −25 (exclude entirely) to +2 (favor). Controls amino acid composition.

Results

Each designed sequence includes:

Column	Description
`Sequence`	The designed amino acid sequence
`Overall confidence`	Model confidence score (0–1). Higher indicates the model is more certain the sequence will fold correctly.
`Seq recovery`	Similarity to the original sequence, if one was present in the input structure

Results can be exported as FASTA, CSV, or JSON.

Applications

Inverse folding enables several protein engineering workflows:

De novo protein design: After generating a novel backbone with tools like RFdiffusion, ProteinMPNN provides sequences likely to fold into that structure
Sequence optimization: Generate variants of existing proteins with potentially improved expression, solubility, or stability
Functional homolog design: Create sequence-diverse proteins that maintain a target fold, useful when avoiding immune recognition or intellectual property
Rescue failed designs: Re-sequence backbones from computationally designed proteins that failed to express or fold

Limitations

ProteinMPNN designs sequences based solely on backbone geometry. It does not consider:

Ligand interactions: For proteins with bound small molecules, metals, or nucleotides, use LigandMPNN instead
Membrane environment: Standard ProteinMPNN was trained on soluble proteins. For transmembrane proteins or optimizing soluble expression, consider SolubleMPNN
Stability optimization: While designs often fold well, ProteinMPNN does not explicitly optimize thermostability. Consider ThermoMPNN for stability predictions

Experimental validation remains essential—computational metrics predict but do not guarantee foldability or function.

LigandMPNN: Inverse folding with ligand, metal, and nucleotide context (63.3% recovery at binding sites)
SolubleMPNN: ProteinMPNN variant trained exclusively on soluble proteins
ESM-IF1: Alternative inverse folding method from Meta AI
ThermoMPNN: Predict thermostability changes (ΔΔG) for mutations
RFdiffusion: Generate novel protein backbones to sequence with ProteinMPNN

ProteinMPNN

Input

Core settings

Design Options

Output

What is ProteinMPNN?

How does ProteinMPNN work?

Encoding structure

Decoding sequences

How to use ProteinMPNN online

Inputs

Settings

Core settings

Temperature interpretation

Design constraints

Results

Applications

Limitations

Related tools

Input

Core settings

Design Options

Output