CleaveNet

Predict MMP cleavage z-scores, evaluate substrates, and generate conditional peptides.

5
Configure input settings on the left, then click "Submit"

Related tools

DeepImmuno

DeepImmuno

Predict peptide immunogenicity with DeepImmuno-CNN from peptide sequences and HLA alleles.

ThermoMPNN

ThermoMPNN

Predict protein thermostability changes (ΔΔG) for point mutations using a graph neural network. Enables computational saturation mutagenesis screening to identify stabilizing mutations.

TLimmuno2

TLimmuno2

Predict MHC class II peptide immunogenicity (CD4+ T cell response) using transfer learning with LSTM.

Aggrescan3D

Aggrescan3D

Faithful static-mode Aggrescan3D wrapper for per-residue aggregation propensity analysis from a single protein structure.

PROPKA 3

PROPKA 3

Predict pKa values of ionizable groups in proteins and protein-ligand complexes from 3D structure. PROPKA calculates environment-driven pKa shifts for standard ionizable residues, terminal groups, and supported ligand atom types.

Protein stability

Protein stability

Predict protein stability using validated BioPython methods: Instability Index, Aliphatic Index, GRAVY, flexibility analysis, and charge distribution

AllMetal3D

AllMetal3D

Predict metal and water binding sites in protein structures using 3D convolutional neural networks (AllMetal3D + Water3D).

DR-BERT

DR-BERT

DR-BERT is a compact protein language model that predicts intrinsically disordered regions (IDRs) in proteins. It outputs per-residue disorder probability scores (0–1) from amino acid sequences, enabling fast and accurate annotation of disordered regions without structural data.

FindPept

FindPept

Match experimental peptide masses against theoretical digest fragments of a protein sequence. Identify peptides from mass spectrometry data by peptide mass fingerprinting.

Peptide cutter

Peptide cutter

Predict protease and chemical cleavage sites across a protein sequence for up to 39 enzymes simultaneously. Identify where each enzyme cuts, the cleavage residue, and context window around each site.

What is CleaveNet?

CleaveNet is a deep learning model developed by Microsoft Research that predicts matrix metalloproteinase (MMP) cleavage sites in peptide sequences. Given a peptide up to 10 residues long, CleaveNet outputs z-scores indicating how efficiently each of 17 MMP variants would cleave that sequence.

Matrix metalloproteinases are zinc-dependent enzymes that digest extracellular matrix components. They play essential roles in tissue remodeling, wound healing, and embryonic development, but are also implicated in cancer progression—where MMPs facilitate tumor invasion and metastasis by degrading physical barriers between tissues. Understanding which sequences a particular MMP will cleave is valuable for designing protease-activated therapeutics, diagnostic biosensors, and studying enzyme specificity.

CleaveNet uses a transformer architecture trained on mRNA display data from Kukreja et al. (2015), which profiled cleavage activity across thousands of peptide substrates. The model learns sequence patterns associated with efficient cleavage by different MMP family members.

How to use CleaveNet online

ProteinIQ provides cloud-based access to CleaveNet, with results returned in seconds without requiring Python installation or GPU hardware.

Input

FieldDescription
Peptide SequencesOne or more sequences, up to 10 residues each. Accepts FASTA format or plain text (one sequence per line). Shorter sequences are automatically padded.

Modes

ModeDescription
Predict cleavagePredicts cleavage z-scores and uncertainty for the 17 supported MMPs.
Generate peptidesSamples peptide candidates without any conditioning input.
Generate from z-scoresSamples peptide candidates from a supplied CleaveNet z-score profile CSV.

Settings

SettingDescription
Predictor architecturetransformer or lstm for prediction mode.
Compare against true z-scoresEnables evaluation mode and accepts a second CSV of ground-truth z-scores.
Truth CSV has no header rowMatches the official --no-csv-header option for truth z-score CSVs.
Sequences per requestNumber of peptides to generate in either generator mode.
TemperatureSampling temperature used by the official generator.
Repeat penaltyPenalty term applied during generator inference.

Additional generator input

In Generate from z-scores mode, CleaveNet expects a conditioning CSV with the full official generator header:

csv
1MMP1,MMP10,MMP11,MMP12,MMP13,MMP14,MMP15,MMP16,MMP17,MMP19,MMP2,MMP20,MMP24,MMP25,MMP3,MMP7,MMP8,MMP920.1,0.0,0.2,0.1,0.3,0.0,0.1,0.1,0.2,0.0,0.1,0.1,0.0,0.2,0.3,0.1,0.0,0.2

Supported MMP variants

CleaveNet predicts cleavage for 17 matrix metalloproteinases:

MMP1, MMP2, MMP3, MMP7, MMP8, MMP9, MMP10, MMP11, MMP12, MMP13, MMP14, MMP15, MMP16, MMP17, MMP19, MMP20, MMP25

The conditional generator uses the official 18-column header above, which also includes MMP24.

Output

In prediction mode, results are returned as a spreadsheet with one row per input peptide. Each MMP variant has both a z-score and an uncertainty column.

ColumnDescription
SequenceThe input peptide sequence
LengthPeptide length
Mmp1 Zscore, Mmp2 Zscore, ...Z-score for each MMP variant
Mmp1 Uncertainty, Mmp2 Uncertainty, ...Prediction uncertainty for each MMP variant

When evaluation mode is enabled, ProteinIQ also returns the official CleaveNet output files, including score CSVs and any evaluation plots CleaveNet generates.

In generator modes, the spreadsheet output contains one row per generated peptide:

ColumnDescription
Condition IndexConditioning row index. Unconditional generation uses 1.
Sample IndexSequence number within that condition.
SequenceGenerated peptide sequence
LengthPeptide length

ProteinIQ also exposes the upstream artifact bundle as downloadable files, including all_scores.csv, all_uncertainty.csv, weighted_all_scores.csv, per-MMP top-hit CSVs, allmmp_top20_cleaved.csv, and generated sequence CSVs when available.

Interpreting z-scores

Z-scores quantify cleavage efficiency relative to the training distribution. Higher positive values indicate stronger predicted cleavage. The uncertainty value is reported separately and is not already folded into the z-score.

Z-scoreInterpretation
> 2.0Strong predicted cleavage
1.0 to 2.0Moderate cleavage likelihood
0 to 1.0Weak cleavage
< 0Below-average cleavage, unlikely substrate

When designing selective substrates, compare z-scores across MMP variants. A peptide with high z-score for MMP13 but low scores for other MMPs would be a candidate MMP13-selective substrate.

How does CleaveNet work?

CleaveNet uses a transformer neural network to predict cleavage from peptide sequence alone. The model tokenizes each amino acid in the input sequence, processes them through self-attention layers that capture dependencies between positions, and outputs a z-score for each MMP variant.

The predictor was trained on data from Kukreja et al. (2015), an mRNA display experiment that measured cleavage rates for thousands of randomized peptide substrates across multiple MMPs. Each peptide in the training set has experimentally measured cleavage values, allowing the model to learn which sequence motifs correlate with efficient cleavage by specific proteases.

Input peptides shorter than 10 residues are padded with special tokens. The model processes the padded sequence and predicts z-scores independently for each MMP—these scores represent how far above or below the mean cleavage rate a given peptide falls, normalized by the standard deviation of the training distribution.

An alternative LSTM architecture is also available in the original codebase for cases where the transformer may overfit to training patterns. The LSTM backbone can sometimes extrapolate better to novel cleavage motifs not well-represented in the training data.

The official CleaveNet codebase also includes a peptide generator. ProteinIQ now exposes both the unconditional generator and the z-score-conditioned generator through the same wrapper.

Applications

Protease-activated drug delivery: Peptide linkers connecting antibodies to cytotoxic payloads can be designed to release the drug only when cleaved by tumor-associated MMPs, reducing off-target toxicity.

Diagnostic biosensors: Peptides that fluoresce upon cleavage serve as activity-based probes. CleaveNet helps identify sequences with appropriate MMP selectivity for detecting specific cancers or inflammatory conditions.

Enzyme specificity studies: Comparing z-scores across the 17 MMP variants reveals which positions in a peptide drive selectivity, informing mechanistic understanding of MMP substrate preferences.

Limitations

  • Predictions are calibrated for 10-residue peptides. The model has not been extensively validated on shorter or longer sequences.
  • CleaveNet predicts relative cleavage efficiency, not absolute kinetic rates. Z-scores indicate ranking, not quantitative Km or kcat values.
  • Training data comes from in vitro mRNA display experiments; in vivo cleavage may differ due to protein folding, localization, and inhibitor presence.
  • Generated peptides are model samples, not experimentally validated substrates.