CleaveNet is a deep learning model developed by Microsoft Research that predicts matrix metalloproteinase (MMP) cleavage sites in peptide sequences. Given a peptide up to 10 residues long, CleaveNet outputs z-scores indicating how efficiently each of 17 MMP variants would cleave that sequence.
Matrix metalloproteinases are zinc-dependent enzymes that digest extracellular matrix components. They play essential roles in tissue remodeling, wound healing, and embryonic development, but are also implicated in cancer progression—where MMPs facilitate tumor invasion and metastasis by degrading physical barriers between tissues. Understanding which sequences a particular MMP will cleave is valuable for designing protease-activated therapeutics, diagnostic biosensors, and studying enzyme specificity.
CleaveNet uses a transformer architecture trained on mRNA display data from Kukreja et al. (2015), which profiled cleavage activity across thousands of peptide substrates. The model learns sequence patterns associated with efficient cleavage by different MMP family members.
ProteinIQ provides cloud-based access to CleaveNet, with results returned in seconds without requiring Python installation or GPU hardware.
| Field | Description |
|---|---|
Peptide Sequences | One or more sequences, up to 10 residues each. Accepts FASTA format or plain text (one sequence per line). Shorter sequences are automatically padded. |
| Mode | Description |
|---|---|
Predict cleavage | Predicts cleavage z-scores and uncertainty for the 17 supported MMPs. |
Generate peptides | Samples peptide candidates without any conditioning input. |
Generate from z-scores | Samples peptide candidates from a supplied CleaveNet z-score profile CSV. |
| Setting | Description |
|---|---|
Predictor architecture | transformer or lstm for prediction mode. |
Compare against true z-scores | Enables evaluation mode and accepts a second CSV of ground-truth z-scores. |
Truth CSV has no header row | Matches the official --no-csv-header option for truth z-score CSVs. |
Sequences per request | Number of peptides to generate in either generator mode. |
Temperature | Sampling temperature used by the official generator. |
Repeat penalty | Penalty term applied during generator inference. |
In Generate from z-scores mode, CleaveNet expects a conditioning CSV with the full official generator header:
1MMP1,MMP10,MMP11,MMP12,MMP13,MMP14,MMP15,MMP16,MMP17,MMP19,MMP2,MMP20,MMP24,MMP25,MMP3,MMP7,MMP8,MMP920.1,0.0,0.2,0.1,0.3,0.0,0.1,0.1,0.2,0.0,0.1,0.1,0.0,0.2,0.3,0.1,0.0,0.2CleaveNet predicts cleavage for 17 matrix metalloproteinases:
MMP1, MMP2, MMP3, MMP7, MMP8, MMP9, MMP10, MMP11, MMP12, MMP13, MMP14, MMP15, MMP16, MMP17, MMP19, MMP20, MMP25
The conditional generator uses the official 18-column header above, which also includes MMP24.
In prediction mode, results are returned as a spreadsheet with one row per input peptide. Each MMP variant has both a z-score and an uncertainty column.
| Column | Description |
|---|---|
Sequence | The input peptide sequence |
Length | Peptide length |
Mmp1 Zscore, Mmp2 Zscore, ... | Z-score for each MMP variant |
Mmp1 Uncertainty, Mmp2 Uncertainty, ... | Prediction uncertainty for each MMP variant |
When evaluation mode is enabled, ProteinIQ also returns the official CleaveNet output files, including score CSVs and any evaluation plots CleaveNet generates.
In generator modes, the spreadsheet output contains one row per generated peptide:
| Column | Description |
|---|---|
Condition Index | Conditioning row index. Unconditional generation uses 1. |
Sample Index | Sequence number within that condition. |
Sequence | Generated peptide sequence |
Length | Peptide length |
ProteinIQ also exposes the upstream artifact bundle as downloadable files, including all_scores.csv, all_uncertainty.csv, weighted_all_scores.csv, per-MMP top-hit CSVs, allmmp_top20_cleaved.csv, and generated sequence CSVs when available.
Z-scores quantify cleavage efficiency relative to the training distribution. Higher positive values indicate stronger predicted cleavage. The uncertainty value is reported separately and is not already folded into the z-score.
| Z-score | Interpretation |
|---|---|
| > 2.0 | Strong predicted cleavage |
| 1.0 to 2.0 | Moderate cleavage likelihood |
| 0 to 1.0 | Weak cleavage |
| < 0 | Below-average cleavage, unlikely substrate |
When designing selective substrates, compare z-scores across MMP variants. A peptide with high z-score for MMP13 but low scores for other MMPs would be a candidate MMP13-selective substrate.
CleaveNet uses a transformer neural network to predict cleavage from peptide sequence alone. The model tokenizes each amino acid in the input sequence, processes them through self-attention layers that capture dependencies between positions, and outputs a z-score for each MMP variant.
The predictor was trained on data from Kukreja et al. (2015), an mRNA display experiment that measured cleavage rates for thousands of randomized peptide substrates across multiple MMPs. Each peptide in the training set has experimentally measured cleavage values, allowing the model to learn which sequence motifs correlate with efficient cleavage by specific proteases.
Input peptides shorter than 10 residues are padded with special tokens. The model processes the padded sequence and predicts z-scores independently for each MMP—these scores represent how far above or below the mean cleavage rate a given peptide falls, normalized by the standard deviation of the training distribution.
An alternative LSTM architecture is also available in the original codebase for cases where the transformer may overfit to training patterns. The LSTM backbone can sometimes extrapolate better to novel cleavage motifs not well-represented in the training data.
The official CleaveNet codebase also includes a peptide generator. ProteinIQ now exposes both the unconditional generator and the z-score-conditioned generator through the same wrapper.
Protease-activated drug delivery: Peptide linkers connecting antibodies to cytotoxic payloads can be designed to release the drug only when cleaved by tumor-associated MMPs, reducing off-target toxicity.
Diagnostic biosensors: Peptides that fluoresce upon cleavage serve as activity-based probes. CleaveNet helps identify sequences with appropriate MMP selectivity for detecting specific cancers or inflammatory conditions.
Enzyme specificity studies: Comparing z-scores across the 17 MMP variants reveals which positions in a peptide drive selectivity, informing mechanistic understanding of MMP substrate preferences.