SPRINT (Structure-aware PRotein ligand INTeraction) is an ultrafast deep learning framework for predicting drug-target interactions. Unlike physics-based docking methods like AutoDock Vina or GNINA, SPRINT predicts binding through learned embeddings rather than explicit 3D docking simulations.
The key advantage is speed. SPRINT can screen thousands of compounds against a protein target in seconds because interaction prediction reduces to a single dot product between pre-computed embeddings. At scale, querying a single protein against billions of compounds takes milliseconds when using vector database retrieval. This makes SPRINT ideal for virtual screening campaigns where large compound libraries need prioritization before more expensive docking or experimental validation.
SPRINT outputs a binding score and binding probability for each compound. Higher scores indicate stronger predicted interactions. The method is best used as a first-pass filter to identify promising candidates, then validating top hits with structure-based methods like DiffDock or AutoDock Vina.
SPRINT uses a co-embedding architecture that maps proteins and ligands into a shared vector space. Compounds that bind to a protein are embedded close together in this space, enabling interaction prediction via dot product similarity.
SPRINT encodes proteins using SaProt, a structure-aware protein language model. SaProt augments standard amino acid tokens with structural tokens derived from Foldseek's 3Di alphabet, creating a vocabulary that captures both sequence and local geometry information. When 3D structure is unavailable, SPRINT uses AlphaFold2-predicted structures to generate these tokens.
Rather than averaging per-residue embeddings, SPRINT applies multi-head attention pooling to learn a sequence-dependent aggregation. This aggregated representation passes through a small MLP to produce the final protein embedding. The attention weights also provide interpretability, highlighting which residues contribute most to predicted interactions.
Compounds are encoded using Morgan fingerprints (2048 bits, radius 2), a standard cheminformatics representation that captures molecular substructures as a binary vector. This fingerprint passes through a small MLP to project into the same embedding space as proteins.
Binding prediction computes the cosine similarity between normalized embeddings:
where scales the similarity to saturate the sigmoid range. The raw cosine similarity ranges from -1 to 1, with higher values indicating stronger predicted binding.
| Input | Description |
|---|---|
Protein Sequence | Amino acid sequence in FASTA or raw text. Standard residues only (ACDEFGHIKLMNPQRSTVWY). |
Compounds (SMILES) | SMILES strings, one per line. Supports tab-separated format: name<TAB>SMILES. Files up to 50MB accepted. |
Compound input example:
1aspirin CC(=O)Oc1ccccc1C(=O)O2ibuprofen CC(C)Cc1ccc(cc1)C(C)C(=O)O3caffeine Cn1cnc2c1c(=O)n(C)c(=O)n2C| Setting | Description |
|---|---|
Return top K compounds | Number of top-scoring compounds to return (10–1000, default 100). Set higher for more candidates. |
Results are ranked by binding score, with the strongest predicted binders at the top.
| Column | Description |
|---|---|
rank | Position in ranked list (1 = highest score) |
compound_id | Name from input or auto-generated identifier |
smiles | Chemical structure in SMILES notation |
binding_score | Cosine similarity between embeddings (higher = stronger binding) |
binding_probability | Sigmoid-scaled score between 0–1 |
Binding scores derive from cosine similarity and range from approximately -1 to 1 before sigmoid transformation.
| Binding probability | Interpretation |
|---|---|
| > 0.8 | Strong predicted interaction |
| 0.6–0.8 | Moderate interaction, worth validating |
| 0.4–0.6 | Weak or uncertain |
| < 0.4 | Unlikely binder |
SPRINT predictions serve as prioritization, not experimental measurements. High-scoring compounds should be validated with structure-based docking or experimental assays.
SPRINT works best as part of a multi-stage virtual screening pipeline:
Invalid SMILES strings are handled gracefully—unparseable inputs receive neutral scores rather than failing the entire job. If many expected binders score poorly, verify input formatting.
SPRINT predicts binding interactions but not binding affinity ( or values). The binding probability reflects confidence that an interaction exists, not the strength of that interaction.
Training data comes from PubChem, BindingDB, and ChEMBL. Performance may decrease for protein families or chemical scaffolds underrepresented in these databases.
SPRINT uses sequence-derived structural information from SaProt rather than explicit 3D coordinates. For targets where binding depends on specific conformational states or allosteric sites, structure-based docking may be more appropriate.