SPRINT

Ultrafast drug-target interaction prediction for virtual screening

Input

Job name

Protein Sequence

Compounds (SMILES)

10 credits

Output

Configure input settings, then click "Run"

What is SPRINT?

SPRINT (Structure-aware PRotein ligand INTeraction) is an ultrafast deep learning framework for predicting drug-target interactions. Unlike physics-based docking methods like AutoDock Vina or GNINA, SPRINT predicts binding through learned embeddings rather than explicit 3D docking simulations.

The key advantage is speed. SPRINT can screen thousands of compounds against a protein target in seconds because interaction prediction reduces to a single dot product between pre-computed embeddings. At scale, querying a single protein against billions of compounds takes milliseconds when using vector database retrieval. This makes SPRINT ideal for virtual screening campaigns where large compound libraries need prioritization before more expensive docking or experimental validation.

SPRINT outputs a binding score and binding probability for each compound. Higher scores indicate stronger predicted interactions. The method is best used as a first-pass filter to identify promising candidates, then validating top hits with structure-based methods like DiffDock or AutoDock Vina.

How does SPRINT work?

SPRINT uses a co-embedding architecture that maps proteins and ligands into a shared vector space. Compounds that bind to a protein are embedded close together in this space, enabling interaction prediction via dot product similarity.

Protein embedding

SPRINT encodes proteins using SaProt, a structure-aware protein language model. SaProt augments standard amino acid tokens with structural tokens derived from Foldseek's 3Di alphabet, creating a vocabulary that captures both sequence and local geometry information. When 3D structure is unavailable, SPRINT uses AlphaFold2-predicted structures to generate these tokens.

Rather than averaging per-residue embeddings, SPRINT applies multi-head attention pooling to learn a sequence-dependent aggregation. This aggregated representation passes through a small MLP to produce the final protein embedding. The attention weights also provide interpretability, highlighting which residues contribute most to predicted interactions.

Ligand embedding

Compounds are encoded using Morgan fingerprints (2048 bits, radius 2), a standard cheminformatics representation that captures molecular substructures as a binary vector. This fingerprint passes through a small MLP to project into the same embedding space as proteins.

Binding prediction

Binding prediction computes the cosine similarity between normalized embeddings:

P(\text{binding}) = \sigma\left(\alpha \cdot \frac{\mathbf{Z}_d}{\|\mathbf{Z}_d\|} \cdot \frac{\mathbf{Z}_t}{\|\mathbf{Z}_t\|}\right)

where $\alpha = 5$ scales the similarity to saturate the sigmoid range. The raw cosine similarity ranges from -1 to 1, with higher values indicating stronger predicted binding.

How to use SPRINT

Inputs

Input	Description
`Protein Sequence`	Amino acid sequence in FASTA or raw text. Standard residues only (ACDEFGHIKLMNPQRSTVWY).
`Compounds (SMILES)`	SMILES strings, one per line. Supports tab-separated format: `name<TAB>SMILES`. Files up to 50MB accepted.

Compound input example:

1aspirin	CC(=O)Oc1ccccc1C(=O)O2ibuprofen	CC(C)Cc1ccc(cc1)C(C)C(=O)O3caffeine	Cn1cnc2c1c(=O)n(C)c(=O)n2C

Settings

Setting	Description
`Return top K compounds`	Number of top-scoring compounds to return (10–1000, default 100). Set higher for more candidates.

Results

Results are ranked by binding score, with the strongest predicted binders at the top.

Column	Description
`rank`	Position in ranked list (1 = highest score)
`compound_id`	Name from input or auto-generated identifier
`smiles`	Chemical structure in SMILES notation
`binding_score`	Cosine similarity between embeddings (higher = stronger binding)
`binding_probability`	Sigmoid-scaled score between 0–1

Interpreting scores

Binding scores derive from cosine similarity and range from approximately -1 to 1 before sigmoid transformation.

Binding probability	Interpretation
> 0.8	Strong predicted interaction
0.6–0.8	Moderate interaction, worth validating
0.4–0.6	Weak or uncertain
< 0.4	Unlikely binder

SPRINT predictions serve as prioritization, not experimental measurements. High-scoring compounds should be validated with structure-based docking or experimental assays.

Screening workflow

SPRINT works best as part of a multi-stage virtual screening pipeline:

Filter compounds for drug-likeness using Lipinski's Rule of 5 or Molecular Descriptors
Run SPRINT to rank compounds by predicted target engagement
Assess top hits for toxicity and ADMET properties with ADMET-AI or eToxPred
Validate promising candidates with structure-based docking (DiffDock, AutoDock Vina, or GNINA)

Invalid SMILES strings are handled gracefully—unparseable inputs receive neutral scores rather than failing the entire job. If many expected binders score poorly, verify input formatting.

Limitations

SPRINT predicts binding interactions but not binding affinity ( $K_d$ or $K_i$ values). The binding probability reflects confidence that an interaction exists, not the strength of that interaction.

Training data comes from PubChem, BindingDB, and ChEMBL. Performance may decrease for protein families or chemical scaffolds underrepresented in these databases.

SPRINT uses sequence-derived structural information from SaProt rather than explicit 3D coordinates. For targets where binding depends on specific conformational states or allosteric sites, structure-based docking may be more appropriate.

DiffDock: ML-based docking that predicts binding poses
AutoDock Vina: Physics-based molecular docking with scoring
GNINA: CNN-enhanced docking for pose prediction and ranking
DynamicBind: Docking with protein flexibility
ADMET-AI: ADMET property prediction for drug candidates
eToxPred: Toxicity prediction for small molecules
Lipinski's Rule of 5: Drug-likeness filtering
Molecular Descriptors: Physicochemical property calculations
Foldseek: Structure-based protein search (uses same 3Di alphabet as SaProt)

SPRINT

Input

Prediction Options

Output

What is SPRINT?

How does SPRINT work?

Protein embedding

Ligand embedding

Binding prediction

How to use SPRINT

Inputs

Settings

Results

Interpreting scores

Screening workflow

Limitations

Related tools

Input

Prediction Options

Output

What is SPRINT?

How does SPRINT work?

Protein embedding

Ligand embedding

Binding prediction

How to use SPRINT

Inputs

Settings

Results

Interpreting scores

Screening workflow

Limitations

Related tools