HyperMPNN

Design protein sequences optimized for thermal stability. Perfect for enzyme engineering and creating proteins that maintain activity at elevated temperatures.

Input

Job name

Protein

Click or drag files to upload (.pdb, .ent)

50 credits

Output

Configure input settings, then click "Submit"

What is HyperMPNN?

HyperMPNN is a deep learning method for designing protein sequences with enhanced thermal stability. Developed by researchers at Leipzig University, HyperMPNN retrains the ProteinMPNN neural network on predicted structures from hyperthermophilic organisms—microorganisms that thrive at temperatures above 80°C. The resulting model learns the amino acid composition patterns that enable proteins to remain folded and functional at extreme temperatures.

Standard ProteinMPNN, trained on the Protein Data Bank, fails to recover the distinctive amino acid preferences found in hyperthermophilic proteins. HyperMPNN addresses this limitation by learning directly from 29,042 AlphaFold2-predicted structures of hyperthermophile proteins, enabling it to generate sequences optimized for thermal resilience.

Applications

Enzyme engineering: Designing industrial enzymes that maintain activity at elevated process temperatures
Vaccine development: Creating thermostable protein nanoparticles that withstand storage and transport without cold chain requirements
Biocatalysis: Engineering proteins for high-temperature chemical manufacturing processes
Protein therapeutics: Improving shelf stability of protein-based drugs

How to use HyperMPNN online

ProteinIQ provides a web interface for running HyperMPNN without installation. Upload a protein structure, configure sampling parameters, and receive designed sequences optimized for thermal stability.

Inputs

Input	Description
`Protein`	The target protein backbone structure. Upload a PDB file or enter a 4-character PDB ID (e.g., `1UBQ`) to fetch from RCSB. HyperMPNN designs new sequences for the provided backbone geometry.

Settings

Core settings

Setting	Description
`Number of sequences`	Sequence variants to generate (1–48, default 8). More sequences provide better coverage of thermostable sequence space. Use 20–40 for comprehensive exploration.
`Sampling temperature`	Controls sequence diversity (0.05–1.0, default 0.1). Lower values produce conservative designs closer to natural thermostable sequences. Higher values explore more diverse sequence space.
`Random seed`	Seed for reproducible results (0–99999, default 111). Same seed with identical settings produces identical designs.

Design options

Setting	Description
`Homo-oligomer`	Enable symmetric design for proteins with identical chains. All chains receive the same sequence, appropriate for homomeric assemblies.
`Fixed positions`	Residues to keep unchanged. Format: chain + position (e.g., `A15,A19,B1-20`). Useful for preserving catalytic sites or binding interfaces.
`Redesigned positions`	Specify only positions to redesign; all others remain fixed. Inverse of fixed positions—use when fewer positions need modification.
`Amino acid biases`	Adjust sampling probabilities for each amino acid. Positive values (+0.1 to +2) increase frequency; negative values decrease it. Set to −25 to completely exclude an amino acid.

Results

HyperMPNN returns a list of designed sequences with comparative analysis against the original.

Column	Description
`Sequence`	The designed amino acid sequence.
`Confidence`	Model confidence in the design (0–1). Higher values indicate designs the model considers more likely to fold correctly.
`Sequence recovery`	Percentage of positions matching the original sequence. Lower values indicate more extensive redesign.
`Mutations`	Number and location of mutations relative to the input sequence.
`Identity`	Sequence identity percentage compared to the original.

Interpreting confidence scores

> 0.8: High confidence design likely to fold as intended
0.6–0.8: Medium confidence; experimental validation recommended
< 0.6: Lower confidence; consider adjusting parameters or using as starting point for further optimization

How does HyperMPNN work?

HyperMPNN applies transfer learning to protein sequence design. Rather than training from scratch, it fine-tunes the pre-trained ProteinMPNN model on structures from organisms adapted to extreme heat.

Training on hyperthermophile data

The training dataset consists of 29,042 predicted protein structures from hyperthermophilic organisms, filtered from AlphaFold2 predictions using a pLDDT confidence threshold of 70. The original 96,738 sequences were clustered to 50% sequence identity to remove redundancy, yielding 34,759 unique sequences before quality filtering.

Training used 0.2 Å Gaussian noise added to backbone coordinates (matching standard ProteinMPNN training), 10% dropout, 300 epochs, and batch sizes of 10,000 residues. The resulting model achieves perplexity of 5.183 and accuracy of 0.483—comparable to original ProteinMPNN performance.

Amino acid composition patterns

Hyperthermophilic proteins differ systematically from mesophilic proteins in their amino acid usage:

Region	Change vs. mesophiles
Surface	+3.9% positively charged residues
Surface	+4.1% apolar residues
Surface	−4.6% polar uncharged residues
Core	+4.4% apolar residues

These compositional shifts contribute to enhanced thermal stability through increased electrostatic interactions and improved hydrophobic packing.

Salt bridge formation

Contrary to some thermal stability theories, hyperthermophilic proteins do not show dramatically more salt bridges than mesophilic proteins (median 17.0 vs. 16.2). However, HyperMPNN-designed sequences consistently achieve the hyperthermophilic salt bridge count (median 17.0), while standard ProteinMPNN produces designs with fewer salt bridges (median 8.8).

Experimental validation

HyperMPNN was validated using the I53-50B pentamer, a component of icosahedral protein nanoparticles used in vaccine development. The parent sequence had a melting temperature of 65°C. HyperMPNN designs remained stable at 95°C—a 30°C improvement in thermal tolerance.

Limitations

Expression challenges: HyperMPNN designs may show reduced soluble expression in mesophilic hosts like E. coli (0.4 mg/L vs. 20+ mg/L for parent sequences). Thermophilic expression hosts such as Thermus thermophilus may improve yields.
Backbone dependent: Like all inverse folding methods, HyperMPNN requires a fixed backbone structure. The designed sequence will only be thermostable if the backbone geometry supports it.
No ligand awareness: HyperMPNN does not consider bound ligands, cofactors, or metal ions when designing sequences. For ligand-binding proteins, consider combining with LigandMPNN.
Sequence recovery: Designs may have low sequence identity to the input, which could affect function if active site residues are not fixed.

ProteinMPNN: The foundational inverse folding model trained on general protein structures
LigandMPNN: Sequence design with ligand, metal, and nucleotide context for binding site optimization
SolubleMPNN: Sequence design optimized for improved protein solubility
AlphaFold 2: Structure prediction for generating input backbones from sequence
ESMFold: Fast structure prediction alternative for preparing HyperMPNN inputs

HyperMPNN

Input

Core settings

Design Options

Output

What is HyperMPNN?

Applications

How to use HyperMPNN online

Inputs

Settings

Core settings

Design options

Results

Interpreting confidence scores

How does HyperMPNN work?

Training on hyperthermophile data

Amino acid composition patterns

Salt bridge formation

Experimental validation

Limitations

Related tools

Input

Core settings

Design Options

Output