HyperMPNN is a deep learning method for designing protein sequences with enhanced thermal stability. Developed by researchers at Leipzig University, HyperMPNN retrains the ProteinMPNN neural network on predicted structures from hyperthermophilic organisms—microorganisms that thrive at temperatures above 80°C. The resulting model learns the amino acid composition patterns that enable proteins to remain folded and functional at extreme temperatures.
Standard ProteinMPNN, trained on the Protein Data Bank, fails to recover the distinctive amino acid preferences found in hyperthermophilic proteins. HyperMPNN addresses this limitation by learning directly from 29,042 AlphaFold2-predicted structures of hyperthermophile proteins, enabling it to generate sequences optimized for thermal resilience.
ProteinIQ provides a web interface for running HyperMPNN without installation. Upload a protein structure, configure sampling parameters, and receive designed sequences optimized for thermal stability.
| Input | Description |
|---|---|
Protein | The target protein backbone structure. Upload a PDB file or enter a 4-character PDB ID (e.g., 1UBQ) to fetch from RCSB. HyperMPNN designs new sequences for the provided backbone geometry. |
| Setting | Description |
|---|---|
Number of sequences | Sequence variants to generate (1–48, default 8). More sequences provide better coverage of thermostable sequence space. Use 20–40 for comprehensive exploration. |
Sampling temperature | Controls sequence diversity (0.05–1.0, default 0.1). Lower values produce conservative designs closer to natural thermostable sequences. Higher values explore more diverse sequence space. |
Random seed | Seed for reproducible results (0–99999, default 111). Same seed with identical settings produces identical designs. |
| Setting | Description |
|---|---|
Homo-oligomer | Enable symmetric design for proteins with identical chains. All chains receive the same sequence, appropriate for homomeric assemblies. |
Fixed positions | Residues to keep unchanged. Format: chain + position (e.g., A15,A19,B1-20). Useful for preserving catalytic sites or binding interfaces. |
Redesigned positions | Specify only positions to redesign; all others remain fixed. Inverse of fixed positions—use when fewer positions need modification. |
Amino acid biases | Adjust sampling probabilities for each amino acid. Positive values (+0.1 to +2) increase frequency; negative values decrease it. Set to −25 to completely exclude an amino acid. |
HyperMPNN returns a list of designed sequences with comparative analysis against the original.
| Column | Description |
|---|---|
Sequence | The designed amino acid sequence. |
Confidence | Model confidence in the design (0–1). Higher values indicate designs the model considers more likely to fold correctly. |
Sequence recovery | Percentage of positions matching the original sequence. Lower values indicate more extensive redesign. |
Mutations | Number and location of mutations relative to the input sequence. |
Identity | Sequence identity percentage compared to the original. |
HyperMPNN applies transfer learning to protein sequence design. Rather than training from scratch, it fine-tunes the pre-trained ProteinMPNN model on structures from organisms adapted to extreme heat.
The training dataset consists of 29,042 predicted protein structures from hyperthermophilic organisms, filtered from AlphaFold2 predictions using a pLDDT confidence threshold of 70. The original 96,738 sequences were clustered to 50% sequence identity to remove redundancy, yielding 34,759 unique sequences before quality filtering.
Training used 0.2 Å Gaussian noise added to backbone coordinates (matching standard ProteinMPNN training), 10% dropout, 300 epochs, and batch sizes of 10,000 residues. The resulting model achieves perplexity of 5.183 and accuracy of 0.483—comparable to original ProteinMPNN performance.
Hyperthermophilic proteins differ systematically from mesophilic proteins in their amino acid usage:
| Region | Change vs. mesophiles |
|---|---|
| Surface | +3.9% positively charged residues |
| Surface | +4.1% apolar residues |
| Surface | −4.6% polar uncharged residues |
| Core | +4.4% apolar residues |
These compositional shifts contribute to enhanced thermal stability through increased electrostatic interactions and improved hydrophobic packing.
Contrary to some thermal stability theories, hyperthermophilic proteins do not show dramatically more salt bridges than mesophilic proteins (median 17.0 vs. 16.2). However, HyperMPNN-designed sequences consistently achieve the hyperthermophilic salt bridge count (median 17.0), while standard ProteinMPNN produces designs with fewer salt bridges (median 8.8).
HyperMPNN was validated using the I53-50B pentamer, a component of icosahedral protein nanoparticles used in vaccine development. The parent sequence had a melting temperature of 65°C. HyperMPNN designs remained stable at 95°C—a 30°C improvement in thermal tolerance.