SPRINT

Predict drug-target interactions and rank compound libraries for virtual screening.

10
Configure input settings on the left, then click "Submit"

Related tools

SMRTnet

SMRTnet

Deep learning framework for predicting small molecule-RNA interactions using RNA secondary structure. Combines language models, CNNs, and graph attention networks for binding prediction.

ADMET-AI

ADMET-AI

Predict ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) properties from SMILES strings using machine learning models trained on Therapeutics Data Commons datasets.

Admetica

Admetica

Predict 22 ADMET properties from SMILES strings with the upstream Admetica Chemprop models from Datagrok.

Brenk filter

Brenk filter

Identify toxic, reactive, and pharmacokinetically problematic molecular fragments using structural alert patterns

eToxPred

eToxPred

Predict toxicity and synthetic accessibility of small molecules using machine learning. eToxPred combines toxicity risk assessment with synthetic accessibility scoring to help prioritize drug candidates.

Lead-likeness filter

Lead-likeness filter

Screen for lead-like compounds using stricter molecular descriptor criteria than Lipinski or Veber rules for early-stage drug discovery

PAINS filter

PAINS filter

Screen compounds for Pan-Assay INterference patterns that cause false positives in biological assays

QEPPI

QEPPI

Quantitative estimate for protein-protein interaction inhibitor potential. Evaluates drug-likeness for compounds targeting PPIs.

ToxPred 2.0 (Toxicity prediction)

ToxPred 2.0 (Toxicity prediction)

Screen compounds for structural toxicity alerts using PAINS, Brenk, and NIH filters. For focused screening, see PAINS Filter, Brenk Filter, or Veber's Rule.

Veber's rule

Veber's rule

Veber's Rule predicts oral bioavailability by evaluating molecular weight, LogP, hydrogen bond donors/acceptors, and rotatable bonds

What is SPRINT?

SPRINT (Structure-aware PRotein ligand INTeraction) is an ultrafast deep learning framework for predicting drug-target interactions. Unlike physics-based docking methods like AutoDock Vina or GNINA, SPRINT predicts binding through learned embeddings rather than explicit 3D docking simulations.

The key advantage is speed. SPRINT can screen thousands of compounds against a protein target in seconds because interaction prediction reduces to a single dot product between pre-computed embeddings. At scale, querying a single protein against billions of compounds takes milliseconds when using vector database retrieval. This makes SPRINT ideal for virtual screening campaigns where large compound libraries need prioritization before more expensive docking or experimental validation.

SPRINT outputs a binding score and binding probability for each compound. Higher scores indicate stronger predicted interactions. The method is best used as a first-pass filter to identify promising candidates, then validating top hits with structure-based methods like DiffDock or AutoDock Vina.

How does SPRINT work?

SPRINT uses a co-embedding architecture that maps proteins and ligands into a shared vector space. Compounds that bind to a protein are embedded close together in this space, enabling interaction prediction via dot product similarity.

Protein embedding

SPRINT encodes proteins using SaProt, a structure-aware protein language model. SaProt augments standard amino acid tokens with structural tokens derived from Foldseek's 3Di alphabet, creating a vocabulary that captures both sequence and local geometry information. When 3D structure is unavailable, SPRINT uses AlphaFold2-predicted structures to generate these tokens.

Rather than averaging per-residue embeddings, SPRINT applies multi-head attention pooling to learn a sequence-dependent aggregation. This aggregated representation passes through a small MLP to produce the final protein embedding. The attention weights also provide interpretability, highlighting which residues contribute most to predicted interactions.

Ligand embedding

Compounds are encoded using Morgan fingerprints (2048 bits, radius 2), a standard cheminformatics representation that captures molecular substructures as a binary vector. This fingerprint passes through a small MLP to project into the same embedding space as proteins.

Binding prediction

Binding prediction computes the cosine similarity between normalized embeddings:

P(binding)=σ(αZdZdZtZt)P(\text{binding}) = \sigma\left(\alpha \cdot \frac{\mathbf{Z}_d}{\|\mathbf{Z}_d\|} \cdot \frac{\mathbf{Z}_t}{\|\mathbf{Z}_t\|}\right)

where α=5\alpha = 5 scales the similarity to saturate the sigmoid range. The raw cosine similarity ranges from -1 to 1, with higher values indicating stronger predicted binding.

How to use SPRINT

Inputs

InputDescription
Protein SequenceProtein sequence in FASTA or raw text, or a precomputed SaProt combined sequence with structure tokens. FASTA headers are used to name downloadable TopK artifacts.
Compounds (SMILES)SMILES strings, one per line. Supports tab-separated format: name<TAB>SMILES. Files up to 50MB accepted in .csv, .smi, .smiles, or .txt formats.

Compound input example:

1aspirin	CC(=O)Oc1ccccc1C(=O)O2ibuprofen	CC(C)Cc1ccc(cc1)C(C)C(=O)O3caffeine	Cn1cnc2c1c(=O)n(C)c(=O)n2C

Settings

SettingDescription
Return top K compoundsNumber of top-scoring compounds to return (10–1000, default 100). Set higher for more candidates.

Results

Results are ranked by binding score, with the strongest predicted binders at the top.

ColumnDescription
rankPosition in ranked list (1 = highest score)
compound_idName from input or auto-generated identifier
smilesChemical structure in SMILES notation
binding_scoreCosine similarity between embeddings (higher = stronger binding)
binding_probabilitySigmoid-scaled score between 0–1

SPRINT also returns two downloadable TopK artifacts aligned with the upstream workflow:

  • topk_mol_data_<query_id>.csv with ranked compounds and the raw cosine similarity in the CosineSimi column
  • topk_mol_embeddings_<query_id>.npy with the embeddings for the returned TopK compounds

Interpreting scores

Binding scores derive from cosine similarity and range from approximately -1 to 1 before sigmoid transformation.

Binding probabilityInterpretation
> 0.8Strong predicted interaction
0.6–0.8Moderate interaction, worth validating
0.4–0.6Weak or uncertain
< 0.4Unlikely binder

SPRINT predictions serve as prioritization, not experimental measurements. High-scoring compounds should be validated with structure-based docking or experimental assays.

Screening workflow

SPRINT works best as part of a multi-stage virtual screening pipeline:

  1. Filter compounds for drug-likeness using Lipinski's Rule of 5 or Molecular Descriptors
  2. Run SPRINT to rank compounds by predicted target engagement
  3. Assess top hits for toxicity and ADMET properties with ADMET-AI or eToxPred
  4. Validate promising candidates with structure-based docking (DiffDock, AutoDock Vina, or GNINA)

Invalid SMILES strings are handled gracefully—unparseable inputs receive neutral scores rather than failing the entire job. If many expected binders score poorly, verify input formatting.

Limitations

SPRINT predicts binding interactions but not binding affinity (KdK_d or KiK_i values). The binding probability reflects confidence that an interaction exists, not the strength of that interaction.

Training data comes from PubChem, BindingDB, and ChEMBL. Performance may decrease for protein families or chemical scaffolds underrepresented in these databases.

SPRINT uses sequence-derived structural information from SaProt rather than explicit 3D coordinates. For targets where binding depends on specific conformational states or allosteric sites, structure-based docking may be more appropriate.