Related tools

SMRTnet
Deep learning framework for predicting small molecule-RNA interactions using RNA secondary structure. Combines language models, CNNs, and graph attention networks for binding prediction.

ADMET-AI
Predict ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) properties from SMILES strings using machine learning models trained on Therapeutics Data Commons datasets.

Admetica
Predict 22 ADMET properties from SMILES strings with the upstream Admetica Chemprop models from Datagrok.

Brenk filter
Identify toxic, reactive, and pharmacokinetically problematic molecular fragments using structural alert patterns

eToxPred
Predict toxicity and synthetic accessibility of small molecules using machine learning. eToxPred combines toxicity risk assessment with synthetic accessibility scoring to help prioritize drug candidates.

Lead-likeness filter
Screen for lead-like compounds using stricter molecular descriptor criteria than Lipinski or Veber rules for early-stage drug discovery

PAINS filter
Screen compounds for Pan-Assay INterference patterns that cause false positives in biological assays

QEPPI
Quantitative estimate for protein-protein interaction inhibitor potential. Evaluates drug-likeness for compounds targeting PPIs.

ToxPred 2.0 (Toxicity prediction)
Screen compounds for structural toxicity alerts using PAINS, Brenk, and NIH filters. For focused screening, see PAINS Filter, Brenk Filter, or Veber's Rule.

Veber's rule
Veber's Rule predicts oral bioavailability by evaluating molecular weight, LogP, hydrogen bond donors/acceptors, and rotatable bonds
What is SPRINT?
SPRINT (Structure-aware PRotein ligand INTeraction) is an ultrafast deep learning framework for predicting drug-target interactions. Unlike physics-based docking methods like AutoDock Vina or GNINA, SPRINT predicts binding through learned embeddings rather than explicit 3D docking simulations.
The key advantage is speed. SPRINT can screen thousands of compounds against a protein target in seconds because interaction prediction reduces to a single dot product between pre-computed embeddings. At scale, querying a single protein against billions of compounds takes milliseconds when using vector database retrieval. This makes SPRINT ideal for virtual screening campaigns where large compound libraries need prioritization before more expensive docking or experimental validation.
SPRINT outputs a binding score and binding probability for each compound. Higher scores indicate stronger predicted interactions. The method is best used as a first-pass filter to identify promising candidates, then validating top hits with structure-based methods like DiffDock or AutoDock Vina.
How does SPRINT work?
SPRINT uses a co-embedding architecture that maps proteins and ligands into a shared vector space. Compounds that bind to a protein are embedded close together in this space, enabling interaction prediction via dot product similarity.
Protein embedding
SPRINT encodes proteins using SaProt, a structure-aware protein language model. SaProt augments standard amino acid tokens with structural tokens derived from Foldseek's 3Di alphabet, creating a vocabulary that captures both sequence and local geometry information. When 3D structure is unavailable, SPRINT uses AlphaFold2-predicted structures to generate these tokens.
Rather than averaging per-residue embeddings, SPRINT applies multi-head attention pooling to learn a sequence-dependent aggregation. This aggregated representation passes through a small MLP to produce the final protein embedding. The attention weights also provide interpretability, highlighting which residues contribute most to predicted interactions.
Ligand embedding
Compounds are encoded using Morgan fingerprints (2048 bits, radius 2), a standard cheminformatics representation that captures molecular substructures as a binary vector. This fingerprint passes through a small MLP to project into the same embedding space as proteins.
Binding prediction
Binding prediction computes the cosine similarity between normalized embeddings:
where scales the similarity to saturate the sigmoid range. The raw cosine similarity ranges from -1 to 1, with higher values indicating stronger predicted binding.
How to use SPRINT
Inputs
| Input | Description |
|---|---|
Protein Sequence | Protein sequence in FASTA or raw text, or a precomputed SaProt combined sequence with structure tokens. FASTA headers are used to name downloadable TopK artifacts. |
Compounds (SMILES) | SMILES strings, one per line. Supports tab-separated format: name<TAB>SMILES. Files up to 50MB accepted in .csv, .smi, .smiles, or .txt formats. |
Compound input example:
1aspirin CC(=O)Oc1ccccc1C(=O)O2ibuprofen CC(C)Cc1ccc(cc1)C(C)C(=O)O3caffeine Cn1cnc2c1c(=O)n(C)c(=O)n2CSettings
| Setting | Description |
|---|---|
Return top K compounds | Number of top-scoring compounds to return (10–1000, default 100). Set higher for more candidates. |
Results
Results are ranked by binding score, with the strongest predicted binders at the top.
| Column | Description |
|---|---|
rank | Position in ranked list (1 = highest score) |
compound_id | Name from input or auto-generated identifier |
smiles | Chemical structure in SMILES notation |
binding_score | Cosine similarity between embeddings (higher = stronger binding) |
binding_probability | Sigmoid-scaled score between 0–1 |
SPRINT also returns two downloadable TopK artifacts aligned with the upstream workflow:
topk_mol_data_<query_id>.csvwith ranked compounds and the raw cosine similarity in theCosineSimicolumntopk_mol_embeddings_<query_id>.npywith the embeddings for the returned TopK compounds
Interpreting scores
Binding scores derive from cosine similarity and range from approximately -1 to 1 before sigmoid transformation.
| Binding probability | Interpretation |
|---|---|
| > 0.8 | Strong predicted interaction |
| 0.6–0.8 | Moderate interaction, worth validating |
| 0.4–0.6 | Weak or uncertain |
| < 0.4 | Unlikely binder |
SPRINT predictions serve as prioritization, not experimental measurements. High-scoring compounds should be validated with structure-based docking or experimental assays.
Screening workflow
SPRINT works best as part of a multi-stage virtual screening pipeline:
- Filter compounds for drug-likeness using Lipinski's Rule of 5 or Molecular Descriptors
- Run SPRINT to rank compounds by predicted target engagement
- Assess top hits for toxicity and ADMET properties with ADMET-AI or eToxPred
- Validate promising candidates with structure-based docking (DiffDock, AutoDock Vina, or GNINA)
Invalid SMILES strings are handled gracefully—unparseable inputs receive neutral scores rather than failing the entire job. If many expected binders score poorly, verify input formatting.
Limitations
SPRINT predicts binding interactions but not binding affinity ( or values). The binding probability reflects confidence that an interaction exists, not the strength of that interaction.
Training data comes from PubChem, BindingDB, and ChEMBL. Performance may decrease for protein families or chemical scaffolds underrepresented in these databases.
SPRINT uses sequence-derived structural information from SaProt rather than explicit 3D coordinates. For targets where binding depends on specific conformational states or allosteric sites, structure-based docking may be more appropriate.
