ProteinIQ
ScanNet icon

ScanNet

Predict protein binding sites using geometric deep learning on 3D structures.

What is ScanNet?

ScanNet (Spatio-Chemical Arrangement of Neighbors Network) is a geometric deep learning model for predicting protein binding sites directly from 3D structures. It identifies where proteins interact with other proteins, antibodies, or intrinsically disordered proteins (IDPs).

Unlike traditional methods that rely on handcrafted features or structural homology, ScanNet learns spatio-chemical patterns end-to-end from atomic coordinates. The model constructs representations by examining the spatial and chemical arrangement of neighboring atoms, enabling it to detect binding sites even on novel protein folds not seen during training.

Published in Nature Methods in 2022, ScanNet demonstrated state-of-the-art accuracy on multiple benchmarks while remaining interpretable through visualization of learned filters.

Applications

Recommended Scannet applications:

  • Protein-protein interface mapping: Identifying which residues mediate interactions between protein binding partners
  • Epitope prediction: Locating antibody binding sites on antigens for vaccine design and therapeutic development
  • IDP binding site detection: Finding regions where structured proteins interact with intrinsically disordered proteins
  • Drug target analysis: Characterizing binding surfaces for structure-based drug design
  • Structural biology: Guiding mutagenesis experiments to validate predicted interaction sites

Limitations

Known Scannet limitations:

  • Static structure assumption: ScanNet analyzes a single conformational state. Proteins with multiple binding-competent conformations or significant induced-fit changes may require ensemble analysis.
  • MSA dependency for some proteins: While noMSA mode is fast and works well for designed proteins, natural proteins with available homologs benefit from evolutionary information. Optimal results may require MSA generation.
  • Binding partner agnostic: ScanNet predicts general binding propensity, not partner-specific interfaces. A residue predicted as high probability may participate in any protein-protein interaction, not necessarily the one of interest.
  • Resolution sensitivity: Very low-resolution structures or models with significant coordinate errors may produce unreliable predictions.

How to use ScanNet online

ProteinIQ provides a web interface for running ScanNet without command-line installation or local database setup. Upload a protein structure, select a prediction mode, and receive per-residue binding probabilities.

Inputs

InputDescription
Protein StructureThe target protein to analyze. Upload a PDB or mmCIF file, enter a PDB ID (e.g., 1brs) to fetch from RCSB, or enter a UniProt ID (e.g., P38398) to retrieve the AlphaFold model.

Settings

Prediction settings

SettingDescription
Prediction modeType of binding site to predict. General binding sites (default) detects protein-protein interfaces. Antibody epitopes identifies B-cell epitope regions. IDP binding sites locates interaction sites for disordered proteins.
Chain selectionSpecific chain(s) to analyze (e.g., A or AB). Leave empty to analyze all chains. For PDB IDs, underscore notation also works in the input field (e.g., 1brs_A).
Skip MSAWhen enabled (default), predictions use structure alone without multiple sequence alignment. Recommended for designed proteins and faster results. Disable for natural proteins when evolutionary conservation data would improve accuracy.

Results

ScanNet outputs a table of per-residue binding site predictions.

ColumnDescription
ResiduePosition in the protein sequence.
ChainChain identifier from the input structure.
AASingle-letter amino acid code at this position.
ProbabilityPredicted likelihood (0–1) that this residue is part of a binding interface.
ClassClassification based on probability thresholds: high, medium, or low.

Interpreting binding probabilities

  • High (≥0.7): Strong prediction of binding site involvement. These residues are likely critical for protein-protein interactions.
  • Medium (0.4–0.7): Moderate confidence. May be peripheral to the binding interface or involved in weaker interactions.
  • Low (below 0.4): Unlikely to participate in binding. These residues are predicted to be outside functional interfaces.

The summary statistics provide an overview: total residues analyzed, counts in each confidence tier, mean probability across the protein, and the maximum probability observed. A high mean probability suggests an extensive binding surface, while a few isolated high-probability residues indicate a more localized interaction site.

How does ScanNet work?

ScanNet employs a hierarchical architecture that processes protein structures at both atomic and amino acid scales, learning to recognize spatio-chemical patterns associated with binding interfaces.

Spatio-chemical filters

At the core of ScanNet are trainable filters that detect specific spatial arrangements of atoms with particular chemical properties. For each atom in the structure, neighboring atoms within a local coordinate frame are extracted. These point clouds pass through linear filters that respond to arrangements such as "hydrophobic atoms surrounding a polar center" or "aromatic ring flanked by charged residues."

The filters are parameterized to be interpretable—each can be visualized to understand what molecular pattern it detects. This interpretability distinguishes ScanNet from black-box deep learning approaches.

Hierarchical representation

ScanNet builds representations at two scales:

  1. Atomic level: Spatio-chemical filters process local atomic neighborhoods, producing per-atom features that capture fine structural details.
  2. Amino acid level: Atomic features are aggregated within each residue and combined with amino acid attributes (type, secondary structure). A second round of spatio-chemical filtering operates on amino acid neighborhoods.

This multi-scale approach allows ScanNet to integrate information from individual atomic contacts up to residue-level surface geometry.

Evolutionary information

When MSA data is available, ScanNet incorporates position-weight matrices derived from sequence alignments. Conservation patterns provide complementary evidence—residues conserved across evolution often participate in functionally important interfaces. For designed proteins or sequences without homologs, the noMSA mode relies purely on structural features.

Training and generalization

ScanNet was trained on protein-protein binding sites from the MaSIF-site benchmark dataset. Testing revealed strong generalization: even for proteins with no sequence or fold similarity to training examples, ScanNet maintained high accuracy. This contrasts with homology-based methods that fail on novel folds.

Performance

ScanNet achieves state-of-the-art accuracy on protein-protein binding site prediction benchmarks.

MetricValue
AUCPR (test set)0.694
Accuracy87.7%
Precision at 50% recall73.5%

Performance remains robust across different levels of similarity to training data. While structural homology methods excel when close homologs exist, their accuracy degrades rapidly for distant or novel folds. ScanNet maintains consistent performance across all homology levels, demonstrating true generalization rather than memorization.

Predictions are also robust to conformational changes between bound and unbound structures, with only minor accuracy drops (88.3% to 86.6% on simulated data, 91.9% to 91.3% on experimental structures).

  • ParaSurf: Surface-based deep learning for antibody paratope prediction—the complementary problem of finding binding sites on antibodies rather than antigens
  • BindCraft: De novo protein binder design, useful after ScanNet identifies target epitopes
  • HADDOCK3: Integrative protein-protein docking that can use predicted binding sites as restraints
  • LightDock: Protein-protein docking with swarm optimization
  • ESMFold: Fast structure prediction for proteins lacking experimental structures, providing input for ScanNet
  • AlphaFold 2: High-accuracy structure prediction when experimental structures are unavailable