ScanNet (Spatio-Chemical Arrangement of Neighbors Network) is a geometric deep learning model for predicting protein binding sites directly from 3D structures. It identifies where proteins interact with other proteins, antibodies, or intrinsically disordered proteins (IDPs).
Unlike traditional methods that rely on handcrafted features or structural homology, ScanNet learns spatio-chemical patterns end-to-end from atomic coordinates. The model constructs representations by examining the spatial and chemical arrangement of neighboring atoms, enabling it to detect binding sites even on novel protein folds not seen during training.
Published in Nature Methods in 2022, ScanNet demonstrated state-of-the-art accuracy on multiple benchmarks while remaining interpretable through visualization of learned filters.
Recommended Scannet applications:
Known Scannet limitations:
ProteinIQ provides a web interface for running ScanNet without command-line installation or local database setup. Upload a protein structure, select a prediction mode, and receive per-residue binding probabilities.
| Input | Description |
|---|---|
Protein Structure | The target protein to analyze. Upload a PDB or mmCIF file, enter a PDB ID (e.g., 1brs) to fetch from RCSB, or enter a UniProt ID (e.g., P38398) to retrieve the AlphaFold model. |
| Setting | Description |
|---|---|
Prediction mode | Type of binding site to predict. General binding sites (default) detects protein-protein interfaces. Antibody epitopes identifies B-cell epitope regions. IDP binding sites locates interaction sites for disordered proteins. |
Chain selection | Specific chain(s) to analyze (e.g., A or AB). Leave empty to analyze all chains. For PDB IDs, underscore notation also works in the input field (e.g., 1brs_A). |
Skip MSA | When enabled (default), predictions use structure alone without multiple sequence alignment. Recommended for designed proteins and faster results. Disable for natural proteins when evolutionary conservation data would improve accuracy. |
ScanNet outputs a table of per-residue binding site predictions.
| Column | Description |
|---|---|
Residue | Position in the protein sequence. |
Chain | Chain identifier from the input structure. |
AA | Single-letter amino acid code at this position. |
Probability | Predicted likelihood (0–1) that this residue is part of a binding interface. |
Class | Classification based on probability thresholds: high, medium, or low. |
The summary statistics provide an overview: total residues analyzed, counts in each confidence tier, mean probability across the protein, and the maximum probability observed. A high mean probability suggests an extensive binding surface, while a few isolated high-probability residues indicate a more localized interaction site.
ScanNet employs a hierarchical architecture that processes protein structures at both atomic and amino acid scales, learning to recognize spatio-chemical patterns associated with binding interfaces.
At the core of ScanNet are trainable filters that detect specific spatial arrangements of atoms with particular chemical properties. For each atom in the structure, neighboring atoms within a local coordinate frame are extracted. These point clouds pass through linear filters that respond to arrangements such as "hydrophobic atoms surrounding a polar center" or "aromatic ring flanked by charged residues."
The filters are parameterized to be interpretable—each can be visualized to understand what molecular pattern it detects. This interpretability distinguishes ScanNet from black-box deep learning approaches.
ScanNet builds representations at two scales:
This multi-scale approach allows ScanNet to integrate information from individual atomic contacts up to residue-level surface geometry.
When MSA data is available, ScanNet incorporates position-weight matrices derived from sequence alignments. Conservation patterns provide complementary evidence—residues conserved across evolution often participate in functionally important interfaces. For designed proteins or sequences without homologs, the noMSA mode relies purely on structural features.
ScanNet was trained on protein-protein binding sites from the MaSIF-site benchmark dataset. Testing revealed strong generalization: even for proteins with no sequence or fold similarity to training examples, ScanNet maintained high accuracy. This contrasts with homology-based methods that fail on novel folds.
ScanNet achieves state-of-the-art accuracy on protein-protein binding site prediction benchmarks.
| Metric | Value |
|---|---|
| AUCPR (test set) | 0.694 |
| Accuracy | 87.7% |
| Precision at 50% recall | 73.5% |
Performance remains robust across different levels of similarity to training data. While structural homology methods excel when close homologs exist, their accuracy degrades rapidly for distant or novel folds. ScanNet maintains consistent performance across all homology levels, demonstrating true generalization rather than memorization.
Predictions are also robust to conformational changes between bound and unbound structures, with only minor accuracy drops (88.3% to 86.6% on simulated data, 91.9% to 91.3% on experimental structures).