ThermoMPNN

Predict protein thermostability changes (ΔΔG) for point mutations using deep learning. Identify stabilizing mutations for protein engineering.

Input

Job name

Protein Structure

Click or drag files to upload (.pdb, .ent)

Analysis Options

Chain to analyze

20 credits

Output

Configure input settings, then click "Submit"

What is ThermoMPNN?

ThermoMPNN is a graph neural network that predicts how single amino acid mutations affect protein thermostability. Developed by Henry Dieckhaus and colleagues at the Kuhlman Lab (University of North Carolina), ThermoMPNN predicts ΔΔG (change in free energy of folding) for point mutations, enabling rapid identification of stabilizing or destabilizing mutations.

The model employs transfer learning from ProteinMPNN, a pretrained sequence recovery model. Rather than training a stability predictor from scratch, ThermoMPNN extracts learned structural embeddings from ProteinMPNN and fine-tunes a lightweight prediction module on stability data. This approach achieves state-of-the-art performance while remaining computationally efficient.

ThermoMPNN was published in Proceedings of the National Academy of Sciences in 2024 and trained on the Megascale dataset containing over 270,000 stability measurements.

Model variants

The ThermoMPNN family has expanded since its initial release:

ThermoMPNN — The original model for single point mutations (this tool)
ThermoMPNN-D — Released August 2024 for predicting ΔΔG of double mutant pairs, addressing multi-position mutations
ThermoMPNN-I — Experimental variant (September 2024) for insertion and deletion predictions, with limited validation

Applications

Protein engineering — Identifying mutations that increase thermostability for industrial enzymes, therapeutic proteins, or research reagents
Disease variant interpretation — Predicting whether clinically observed mutations are likely to destabilize protein structure
Directed evolution guidance — Prioritizing mutation candidates for experimental testing in protein optimization campaigns
Enzyme stabilization — Finding mutations that improve thermal tolerance without disrupting catalytic activity
Therapeutic protein development — Enhancing shelf life and manufacturability of protein biologics through stability optimization

How to use ThermoMPNN online

ProteinIQ provides a web-based interface for running ThermoMPNN without command-line installation. Upload a protein structure, specify which chain to analyze, and receive ΔΔG predictions for all possible single mutations at each position (saturation mutagenesis).

Inputs

Input	Description
`Protein Structure`	The protein to analyze. Upload a PDB file or enter a PDB ID (e.g., `1HSG`) to fetch from RCSB.

Settings

Setting	Description
`Chain to analyze`	Which chain to run predictions on. Leave empty to analyze all chains in the structure.

Results

The output is a spreadsheet containing ΔΔG predictions for every possible mutation at each residue position. Results can be exported as CSV or JSON.

Column	Description
`mutation_code`	Mutation identifier in format `ChainWildTypePositionMutant` (e.g., `AK45R` means lysine at position 45 on chain A mutated to arginine).
`position`	Residue number in the structure.
`chain`	Chain identifier.
`wild_type`	Original amino acid at this position (single-letter code).
`mutation`	Substituted amino acid (single-letter code).
`ddG`	Predicted change in free energy of folding (kcal/mol).

Interpreting ΔΔG values

The ΔΔG value represents the predicted change in thermodynamic stability upon mutation:

Negative ΔΔG — Stabilizing mutation (protein becomes more stable)
ΔΔG ≈ 0 — Neutral mutation (minimal stability change)
Positive ΔΔG — Destabilizing mutation (protein becomes less stable)

Typical interpretation thresholds:

ΔΔG Range	Interpretation
< −1.0 kcal/mol	Strongly stabilizing
−1.0 to −0.5 kcal/mol	Moderately stabilizing
−0.5 to +0.5 kcal/mol	Neutral
+0.5 to +1.0 kcal/mol	Moderately destabilizing
> +1.0 kcal/mol	Strongly destabilizing

The model's dynamic range is approximately −5 to +5 kcal/mol based on its training data. Predictions outside this range should be interpreted with caution.

Self-mutations

The output includes self-mutations (e.g., A→A) with ΔΔG values near zero. These serve as internal controls and confirm the model correctly predicts no stability change when the amino acid remains unchanged.

How does ThermoMPNN work?

ThermoMPNN combines a frozen pretrained ProteinMPNN feature extractor with a lightweight stability prediction module. The model treats proteins as graphs where residues are nodes and spatial relationships between atoms define edges.

Architecture

The architecture consists of three components:

ProteinMPNN feature extractor — A message-passing neural network with three encoder and three decoder layers. It processes structural information using Gaussian radial basis functions that encode distances to the 48 nearest neighboring residues. The encoder layers are frozen during training to preserve learned structural representations.
Light attention block — A self-attention mechanism with padded convolutions that reweights the extracted embeddings based on learned context. This allows the model to focus on residue features most relevant to stability prediction.
MLP prediction head — A multilayer perceptron with two hidden layers (sizes 64 and 32) that outputs ΔΔG predictions. The final value is computed by subtracting the predicted ΔG for the wild-type amino acid from the predicted ΔG of the mutant amino acid.

Transfer learning approach

Traditional stability predictors require large amounts of experimental stability data for training. ThermoMPNN circumvents this limitation by leveraging ProteinMPNN's pretrained knowledge of protein structure-sequence relationships. The ProteinMPNN encoder has learned generalizable structural features from millions of protein sequences, which transfer effectively to stability prediction tasks.

Training data

The primary training dataset is the Megascale dataset from Tsuboyama et al., containing 272,712 stability measurements across 298 proteins (181 natural and 109 de novo designed). These measurements derive from proteolysis sensitivity experiments with a dynamic range of approximately 5 kcal/mol.

The model was additionally validated on the Fireprot dataset (3,438 mutations across 100 proteins), which contains traditional biophysical measurements with a wider dynamic range (−9 to +12 kcal/mol).

Performance

Benchmark performance on held-out test sets:

Dataset	Pearson Correlation	RMSE (kcal/mol)
Megascale	0.754	0.708
Fireprot (homologue-free)	0.650	1.51
Ssym (direct)	0.72	—

For identifying stabilizing mutations (ΔΔG < −0.5 kcal/mol), the positive predictive value is approximately 56% on Fireprot and 46% on Megascale.

Limitations

Dynamic range constraint — Training on Megascale limits accurate predictions to approximately ±5 kcal/mol. Larger stability changes may show degraded performance.
Epistatic effects — Single-mutation predictions assume additive effects. A 2025 study in Protein Science demonstrated that stability models, including ThermoMPNN, struggle to capture epistatic interactions of double point mutations. For multiple mutations, consider using ThermoMPNN-D or validating experimentally.
Surface cysteine artifacts — The Megascale assay methodology artificially favors surface cysteines through intermolecular disulfide formation. Cysteine predictions at surface positions should be interpreted cautiously.
Hydrophobicity bias — The model exhibits a slight bias toward hydrophobic mutations, which could promote aggregation if used for comprehensive protein redesign rather than targeted single-site optimization.
Structure quality dependency — Performance on low-confidence structures (pLDDT < 0.75) or NMR structures may be reduced compared to high-resolution crystal structures.
Single mutations only — ThermoMPNN predicts effects of individual point mutations. For double mutations with epistatic effects, ThermoMPNN-D is available (separate tool).

ProteinMPNN — The pretrained sequence design model that provides ThermoMPNN's feature extractor
LigandMPNN — Sequence design for proteins with bound ligands, also built on the MPNN architecture
SolubleMPNN — Sequence design optimized for protein solubility, another MPNN-family model
ESMFold — Structure prediction for generating input structures when experimental structures are unavailable
AlphaFold 2 — High-accuracy structure prediction for generating input coordinates
MolProbity — Structure validation to assess input quality before running stability predictions
NetSolP — Complementary property prediction for protein solubility

ThermoMPNN

Input

Output

What is ThermoMPNN?

Model variants

Applications

How to use ThermoMPNN online

Inputs

Settings

Results

Interpreting ΔΔG values

Self-mutations

How does ThermoMPNN work?

Architecture

Transfer learning approach

Training data

Performance

Limitations

Related tools

Input

Output