HyperMPNN

Design protein sequences optimized for thermal stability from backbone structures.

50
Configure input settings on the left, then click "Submit"

Related tools

LigandMPNN

LigandMPNN

Design protein sequences with atomic context from ligands, metals, and nucleotides. Achieves 63.3% sequence recovery at binding sites, significantly outperforming ProteinMPNN (50.5%).

ProteinMPNN

ProteinMPNN

Design protein sequences for given backbone structures using deep learning. Fast and accurate inverse folding with state-of-the-art sequence recovery (52.4%).

SolubleMPNN

SolubleMPNN

Specialized model for soluble protein sequence design. Trained exclusively on soluble proteins for optimized performance on cytoplasmic and extracellular proteins.

IgDesign

IgDesign

Design antibody CDR sequences via inverse folding. Generates complementarity-determining region (CDR) sequences for antibodies targeting therapeutic antigens using deep learning. Optimizes CDR loops (HCDR1, HCDR2, HCDR3) based on antibody-antigen complex structures.

AntiFold

AntiFold

Inverse folding for antibody variable domains and nanobodies. Predicts amino acid sequences compatible with antibody structures using IMGT numbering while preserving upstream AntiFold chain handling and structural constraints.

ESM-IF1

ESM-IF1

Inverse folding with ESM-IF1. Design protein sequences for given 3D backbone structures using a geometric deep learning model. Generate multiple sequence variants optimized for your target structure.

ProFam

ProFam

ProFam-1 is a protein family language model for family-conditioned sequence generation. Provide a protein family FASTA/MSA and generate new sequences with model likelihood scores for downstream ranking and screening.

PepMLM

PepMLM

Design linear peptide binders for target proteins using a target sequence-conditioned masked language model. PepMLM generates peptide sequences optimized to bind specific protein targets based on ESM-2 protein language modeling.

BindCraft

BindCraft

Design de novo protein binders using AlphaFold2 backpropagation, ProteinMPNN sequence optimization, and PyRosetta relaxation. BindCraft generates novel protein sequences that bind to user-specified target surfaces.

EvoPro

EvoPro

Optimize protein binders using genetic algorithms combined with AlphaFold2 fitness evaluation and ProteinMPNN sequence design. EvoPro evolves protein sequences to maximize binding affinity and structural quality through iterative cycles of mutation, selection, and validation.

What is HyperMPNN?

Many thermostability design projects start with a solved or predicted protein structure and a specific question: which amino acid sequence is more likely to keep this backbone folded at high temperature? HyperMPNN answers that inverse-folding question by using ProteinMPNN weights retrained on proteins from hyperthermophilic organisms.

The upstream HyperMPNN repository is not a separate inference engine. It supplies retrained model weights that run through the original ProteinMPNN code path with --path_to_model_weights and --model_name. ProteinIQ follows that upstream behavior and exposes the HyperMPNN checkpoints as protein-only sequence design models.

HyperMPNN is most useful when thermal resilience is the design objective and the backbone is already chosen. Typical inputs include enzyme structures for high-temperature biocatalysis, vaccine nanoparticle components that need better storage stability, and scaffold proteins where functionally important residues can be fixed while the rest of the sequence is redesigned.

How to use HyperMPNN online

HyperMPNN runs online from one protein backbone, supplied as a PDB file or RCSB PDB ID, with a chosen number of thermostability-biased sequence variants. ProteinIQ returns designed sequences, mutation lists, upstream ProteinMPNN scores, sequence recovery, a FASTA file, and optional score or probability arrays for downstream analysis.

Inputs

InputAccepted formatsNotes
Protein.pdb, .ent, or 4-character RCSB PDB ID such as 1CRNThe structure must contain protein atoms. HyperMPNN designs sequence for the provided backbone geometry and does not predict a new backbone.

The input structure determines what HyperMPNN can preserve. Fixed catalytic residues, metal-binding residues, disulfide cysteines, interface hot spots, and experimentally required mutations should be constrained before running broad redesigns.

Settings

Core settings

SettingDefaultDescription
HyperMPNN modelv48_020_epoch300_hyperCheckpoint from the upstream HyperMPNN retraining set. The default 0.20 A noise model matches the main upstream example and standard ProteinMPNN training noise.
Number of sequences1Number of designs to sample, from 1 to 48. One sequence matches the upstream default. Initial screens usually benefit from 8 to 10 designs, while larger libraries can use 20 to 40.
Sampling temperature0.1Diversity control. Values near 0.05 to 0.1 produce conservative designs. Values around 0.2 to 0.3 explore more substitutions. Higher values can create diverse but less probable sequences.
Random seed0Upstream ProteinMPNN treats 0 as random seed selection. A nonzero integer makes a design run reproducible with the same input and settings.

Design controls

SettingDescription
Chains to designComma-separated chain IDs such as A,B. Omitted chains stay fixed. Leaving the field empty designs all chains.
Homo-oligomerTies equivalent positions across selected chains, so homomeric chains receive the same sequence. Selected chains must have matching residue counts.
Fixed positionsPositions that must remain unchanged, using entries such as A15, A19, A1-10, or B. Fixed positions protect active sites, binding residues, and known stabilizing mutations.
Redesigned positionsPositions allowed to change; all other positions are fixed. This is useful for loop redesigns, surface patches, and local thermostability experiments.
Exclude amino acidsOne-letter amino acid codes to omit globally, such as C or CW. Exclusions help when unwanted cysteines, oxidation-prone residues, or rare residues would complicate experiments.
Save score fileSaves upstream score arrays as .npz files for custom ranking or offline analysis.
Save probability fileSaves per-position probability arrays as .npz files for uncertainty analysis, sequence logos, or residue-level sampling diagnostics.
Amino acid biasesAdds global sampling biases for individual amino acids. Positive values increase sampling frequency. Negative values reduce it. A value near -25 effectively excludes an amino acid.

Results

HyperMPNN returns both a human-readable table and upstream files. The FASTA output is the primary scientific output because it preserves ProteinMPNN headers and metadata.

OutputMeaning
SequenceDesigned amino acid sequence. Multi-chain designs are separated in the upstream FASTA style.
ScoreAverage negative log probability over designed residues. Lower values mean the model assigned higher probability to the sampled sequence at the redesigned positions.
Global scoreAverage negative log probability over all residues in the structure. Lower values are better for overall sequence fit to the backbone.
Sequence recoveryFraction of redesigned positions that match the input sequence. Low recovery means the model changed many residues; high recovery means the design stayed close to the starting sequence.
MutationsSubstitutions relative to the input structure, reported with chain and residue context.
IdentityPercent identity to the original sequence over the designed region.
FASTA fileUpstream ProteinMPNN sequence output with native and sampled records.
Score NPZOptional upstream score arrays when Save score file is enabled.
Probability NPZOptional per-position probability arrays when Save probability file is enabled.

Interpreting HyperMPNN designs

HyperMPNN scores rank sequences by model likelihood, not by measured melting temperature. A low Score or Global score indicates that the sequence is compatible with the input backbone under the HyperMPNN model, but experimental thermostability still depends on expression, folding kinetics, oligomerization, cofactors, and the validity of the backbone.

Global score is the better broad ranking field for complete designs. Score is more useful when the redesign is local and fixed residues dominate the structure. When both scores are similar across candidates, mutation patterns usually matter more than small score differences.

Sequence recovery has a different meaning in thermostability redesign than in native-sequence recovery benchmarks. A low value is not automatically bad. HyperMPNN is expected to move sequences toward hyperthermophile-like composition, so useful designs may have substantial substitutions on the surface, in the core, or around flexible regions.

Practical ranking workflow

  • Initial sampling: 8 to 10 designs at Sampling temperature 0.1 or 0.15 gives enough diversity for a first pass.
  • Functional filter: Designs that mutate protected functional residues should be discarded unless those residues were intentionally left redesignable.
  • Model ranking: Strong candidates combine low Global score, reasonable mutation load, and no obvious disruption of active-site chemistry or interface contacts.
  • Enzyme designs: Catalytic residues usually belong in Fixed positions, followed by stability or function-specific evaluation of the redesigned structures.
  • Mutation-focused studies: ThermoMPNN is more direct for single-site or saturation mutagenesis stability questions because it predicts mutation-level thermostability changes rather than full sequences.

How HyperMPNN works

HyperMPNN fine-tunes the ProteinMPNN inverse-folding architecture on structures from hyperthermophilic organisms. The published workflow started from 96,738 sequences, clustered them at 50 percent identity, filtered AlphaFold2-predicted structures by pLDDT above 70, and trained on 29,042 protein structures from organisms adapted to extreme heat.

During inference, HyperMPNN receives the same information as ProteinMPNN: backbone geometry, chain selection, fixed-position masks, tied positions, amino acid omissions, and optional amino acid biases. The difference is the learned sequence distribution. HyperMPNN shifts sampling toward amino acid patterns observed in thermostable proteins rather than the broader Protein Data Bank distribution.

The HyperMPNN paper reports several composition-level trends in hyperthermophilic proteins and HyperMPNN designs:

Structural regionReported shift versus mesophilic proteins
SurfaceMore positively charged residues
SurfaceMore apolar residues
SurfaceFewer polar uncharged residues
CoreMore apolar residues

These shifts are consistent with common thermostability mechanisms, including tighter hydrophobic packing and stronger electrostatic networks. HyperMPNN should not be read as a salt-bridge maximizer. The reported median salt bridge count in native hyperthermophilic proteins was close to the mesophilic comparison set, while HyperMPNN designs recovered hyperthermophile-like salt bridge counts better than standard ProteinMPNN in the authors' analysis.

The experimental validation case in the HyperMPNN preprint used the I53-50B pentamer from an icosahedral nanoparticle system. Designs selected from the HyperMPNN workflow retained stability at 95 degrees C, while the parent sequence had a melting temperature of 65 degrees C. That result is a strong proof of concept, not a guarantee that every redesigned backbone will gain 30 degrees C of thermal tolerance.

When to use HyperMPNN vs alternatives

ToolBest fitMain caveat
HyperMPNNFull-sequence or region-specific redesign with thermostability as the main objectiveDoes not model ligands, cofactors, or assay-specific function directly.
ProteinMPNNGeneral inverse folding when no special property bias is desiredLess targeted toward hyperthermophile-like sequence composition.
LigandMPNNRedesign around ligands, metals, nucleotides, or fixed side-chain contextBetter for binding-site preservation than thermostability bias.
SolubleMPNNProtein-only sequence design where soluble expression is the prioritySolubility and thermostability can conflict, so expression checks still matter.
ESM-IF1Alternative inverse-folding model for generating backbone-compatible sequencesNot specifically trained for thermostable sequence composition.
ThermoMPNNScoring point mutations for predicted stability changesMutation scoring does not generate full redesigned sequences.

For structures that come from sequence rather than experiment, backbone prediction or refinement with AlphaFold2 should come before HyperMPNN. Low-confidence or flexible regions need cautious interpretation because HyperMPNN can only design against the geometry it receives.

Design caveats that affect experiments

  • Expression can drop: Thermostable sequence patterns may reduce soluble expression in mesophilic hosts such as E. coli. The HyperMPNN validation work observed that expression host and purification strategy matter.
  • Function is not guaranteed: Active sites, binding residues, post-translational motifs, and interface residues should be fixed when function must be preserved.
  • Backbone errors propagate: A predicted or low-resolution structure can produce plausible sequences for the wrong geometry. Confidence in the backbone should be checked before interpreting model scores.
  • No ligand awareness: Bound ligands, metals, and cofactors are not part of HyperMPNN inference. Ligand-sensitive redesign should use LigandMPNN or fix residues that coordinate the non-protein atoms.
  • Composition bias is global: Amino acid biases affect sampling across designed positions. They are useful for broad constraints, but position-specific design intent is better expressed with fixed or redesigned residue lists.