DLKcat

Predict enzyme kcat values from protein sequences and substrate structures or names.

1
Configure input settings on the left, then click "Submit"

Related tools

NetSolP-1.0

NetSolP-1.0

Predict protein solubility and usability for E. coli expression using ESM protein language models

SPRINT

SPRINT

SPRINT (Structure-aware PRotein ligand INTeraction) predicts drug-target interactions using co-embedded protein and ligand representations. Screen thousands of compounds against a protein target in seconds.

ADMET-AI

ADMET-AI

Predict ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) properties from SMILES strings using machine learning models trained on Therapeutics Data Commons datasets.

Admetica

Admetica

Predict 22 ADMET properties from SMILES strings with the upstream Admetica Chemprop models from Datagrok.

Brenk filter

Brenk filter

Identify toxic, reactive, and pharmacokinetically problematic molecular fragments using structural alert patterns

eToxPred

eToxPred

Predict toxicity and synthetic accessibility of small molecules using machine learning. eToxPred combines toxicity risk assessment with synthetic accessibility scoring to help prioritize drug candidates.

Lead-likeness filter

Lead-likeness filter

Screen for lead-like compounds using stricter molecular descriptor criteria than Lipinski or Veber rules for early-stage drug discovery

Lipinski's rule of 5

Lipinski's rule of 5

Lipinski's Rule of Five predicts whether compounds will be orally bioavailable by evaluating molecular weight, LogP, hydrogen bond donors, and acceptors.

PAINS filter

PAINS filter

Screen compounds for Pan-Assay INterference patterns that cause false positives in biological assays

QEPPI

QEPPI

Quantitative estimate for protein-protein interaction inhibitor potential. Evaluates drug-likeness for compounds targeting PPIs.

What is DLKcat?

DLKcat predicts enzyme turnover numbers (kcat values) from protein sequences and substrate structures. The turnover number represents how many substrate molecules an enzyme can convert to product per second under saturating conditions—a fundamental kinetic parameter for understanding enzyme efficiency.

Developed at Chalmers University of Technology and published in Nature Catalysis (2022), DLKcat combines a convolutional neural network (CNN) for processing protein sequences with a graph neural network (GNN) for analyzing substrate molecular structures. This dual-network architecture allows the model to learn patterns in enzyme-substrate interactions that correlate with catalytic rates.

The model was trained on over 16,000 experimentally measured kcat values from the BRENDA and SABIO-RK enzyme databases, covering both wild-type and engineered enzymes across diverse species.

How does DLKcat work?

DLKcat processes enzyme-substrate pairs through two parallel neural networks:

  • Protein encoding: The enzyme sequence is split into overlapping 3-gram amino acid fragments. A CNN with three layers extracts features that capture sequence patterns associated with catalytic activity.
  • Substrate encoding: The substrate's SMILES representation is converted to a molecular graph where atoms are nodes and bonds are edges. A GNN with three time steps propagates information through the graph, learning structural features relevant to enzymatic processing.

The outputs from both networks are concatenated and passed through fully connected layers to predict log10(kcat). Training used radius-2 substrate subgraphs and 20-dimensional vector embeddings.

How to use DLKcat online

ProteinIQ runs DLKcat on cloud GPUs, delivering turnover number predictions in seconds without local installation.

Inputs

InputDescription
Enzyme sequence(s)FASTA format protein sequences. Multiple enzymes supported for batch analysis.
Substrate(s)SMILES strings or compound names (one per line). The tool will pair each enzyme with each substrate.

Settings

SettingDescription
TemperaturePrediction temperature in °C (0–100, default 37). Note: The model was not trained with temperature dependence, so this is primarily for record-keeping.
pHPrediction pH (4.0–10.0, default 7.0). Like temperature, included for documentation rather than influencing predictions.

Results

The output table contains predicted kcat values for each enzyme-substrate pair:

ColumnDescription
Enzyme SequenceThe input protein sequence
SubstrateThe input substrate (SMILES or name)
Predicted kcat (1/s)Turnover number in reactions per second
ConfidenceModel confidence score
Temperature (°C)The temperature setting used
pHThe pH setting used

Interpreting predictions

DLKcat predicts log10(kcat), then converts to linear scale. Predictions span a wide range:

kcat (1/s)Interpretation
> 1000Fast enzyme, typical of metabolic enzymes
100–1000Moderate catalytic rate
10–100Slow enzyme
< 10Very slow, may indicate poor substrate match

The model achieves Pearson's r ≈ 0.71 and RMSE < 1 (log10 scale) on held-out test data. Performance is best for enzymes similar to the training set—predictions become less reliable as sequence similarity to training enzymes decreases.

Limitations

DLKcat has important constraints to consider:

Sequence similarity dependence: The model performs well when test enzymes share >70% sequence identity with training examples. For dissimilar enzymes, predictions may not outperform simple kcat averages.

No environmental factors: Despite the temperature and pH settings, the original model was not trained with environmental condition data. The tool includes these settings for record-keeping, but they do not affect predictions.

Mutant enzymes: While trained on some mutant data, predictions for novel mutations—especially those affecting catalytic residues—should be validated experimentally.

Substrate coverage: Predictions are most reliable for substrates similar to those in the BRENDA/SABIO-RK training data.

Applications

Enzyme turnover predictions support several research applications:

  • Metabolic modeling: Parameterizing enzyme-constrained genome-scale models (ecGEMs) with predicted kcat values
  • Enzyme engineering: Screening mutant libraries computationally before experimental characterization
  • Pathway design: Identifying rate-limiting enzymes in synthetic biology pathways
  • Comparative enzymology: Analyzing kcat distributions across species or enzyme families