Input
Output
Alcohol dehydrogenase with ethanol
Specific enzyme-substrate pairs

Predict ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) properties from SMILES strings using machine learning models trained on Therapeutics Data Commons datasets.

Predict 22 ADMET properties from SMILES strings with the native Admetica Chemprop models from Datagrok.

AF2BIND predicts ligand-binding residues from a protein structure using AlphaFold2 pair representations and a 20-residue bait sequence.

Identify toxic, reactive, and pharmacokinetically problematic molecular fragments using structural alert patterns

Predict toxicity and synthetic accessibility of small molecules using machine learning. eToxPred combines toxicity risk assessment with synthetic accessibility scoring to help prioritize drug candidates.

Screen for lead-like compounds using stricter molecular descriptor criteria than Lipinski or Veber rules for early-stage drug discovery

Predict protein solubility and usability for E. coli expression using ESM protein language models

Screen compounds for Pan-Assay Interference patterns that cause false positives in biological assays

Quantitative estimate for protein-protein interaction inhibitor potential. Evaluates drug-likeness for compounds targeting PPIs.

SPRINT (Structure-aware Protein-ligand Interaction) predicts drug-target interactions using co-embedded protein and ligand representations. Screen thousands of compounds against a protein target in seconds.
DLKcat predicts enzyme turnover numbers (kcat values) from protein sequences and substrate structures. The turnover number represents how many substrate molecules an enzyme can convert to product per second under saturating conditions—a fundamental kinetic parameter for understanding enzyme efficiency.
Developed at Chalmers University of Technology and published in Nature Catalysis (2022), DLKcat combines a convolutional neural network (CNN) for processing protein sequences with a graph neural network (GNN) for analyzing substrate molecular structures. This dual-network architecture allows the model to learn patterns in enzyme-substrate interactions that correlate with catalytic rates.
The model was trained on over 16,000 experimentally measured kcat values from the BRENDA and SABIO-RK enzyme databases, covering both wild-type and engineered enzymes across diverse species.
DLKcat processes enzyme-substrate pairs through two parallel neural networks:
The outputs from both networks are concatenated and passed through fully connected layers to predict log10(kcat). Training used radius-2 substrate subgraphs and 20-dimensional vector embeddings.
ProteinIQ runs DLKcat on cloud GPUs, delivering turnover number predictions in seconds without local installation.
| Input | Description |
|---|---|
Enzyme sequence(s) | FASTA format protein sequences. Multiple enzymes supported for batch analysis. |
Substrate(s) | SMILES strings or compound names (one per line). The tool will pair each enzyme with each substrate. |
For specific enzyme-substrate pairs, switch to paired TSV input and provide one row per pair:
| Column | Description |
|---|---|
Substrate Name | Compound name used for labeling and PubChem lookup when SMILES is blank |
Substrate SMILES | Substrate SMILES, recommended for reliable predictions |
Protein Sequence | Enzyme amino acid sequence |
| Setting | Description |
|---|---|
Input mode | Choose separate enzyme/substrate inputs for all combinations, or paired TSV rows for specific enzyme-substrate pairs. |
The output table contains predicted kcat values for each enzyme-substrate pair:
| Column | Description |
|---|---|
Substrate Name | The substrate label |
Substrate SMILES | The substrate structure used for prediction |
Protein Sequence | The full input protein sequence |
Kcat value (1/s) | Turnover number in reactions per second |
Log2 Kcat | Raw model output before conversion to linear scale |
DLKcat predicts log2(kcat), then converts to linear scale. Predictions span a wide range:
| kcat (1/s) | Interpretation |
|---|---|
| > 1000 | Fast enzyme, typical of metabolic enzymes |
| 100–1000 | Moderate catalytic rate |
| 10–100 | Slow enzyme |
| < 10 | Very slow, may indicate poor substrate match |
The model achieves Pearson's r ≈ 0.71 and RMSE < 1 (log10 scale) on held-out test data. Performance is best for enzymes similar to the training set—predictions become less reliable as sequence similarity to training enzymes decreases.
DLKcat has important constraints to consider:
Sequence similarity dependence: The model performs well when test enzymes share >70% sequence identity with training examples. For dissimilar enzymes, predictions may not outperform simple kcat averages.
No environmental factors: DLKcat was not trained with temperature or pH condition data, so predictions are based on enzyme sequence and substrate structure.
Mutant enzymes: While trained on some mutant data, predictions for novel mutations—especially those affecting catalytic residues—should be validated experimentally.
Substrate coverage: Predictions are most reliable for substrates similar to those in the BRENDA/SABIO-RK training data.
Enzyme turnover predictions support several research applications: