ParaSurf

Surface-based deep learning for paratope-antigen interaction prediction

Input

Job name

Antibody Structure

25 credits

Output

Configure input settings on the left, then click "Submit"

What is ParaSurf?

ParaSurf is a surface-based deep learning model that predicts paratope binding sites on antibodies. A paratope is the region of an antibody that physically contacts and binds to an antigen—identifying these sites is essential for understanding immune recognition, designing therapeutic antibodies, and developing vaccines.

The model is antigen-agnostic: it only requires the antibody structure, not the antigen. This makes ParaSurf useful for predicting binding sites before experimental characterization or when the target antigen is unknown.

ParaSurf achieves state-of-the-art accuracy on multiple benchmarks, with particularly strong performance on the highly variable CDR3 loops that are critical for antigen specificity. If you don't have an antibody structure yet, you can generate one using ESMFold, Chai-1, or Boltz-2.

How does ParaSurf work?

Surface representation

Rather than operating on raw atomic coordinates, ParaSurf extracts the solvent-accessible surface of the antibody and samples points across this surface. Each surface point becomes a local prediction target.

For each surface point, the model constructs a $41 \times 41 \times 41$ voxel grid centered on that point with 1 Å resolution, covering approximately a 20 Å radius. The grid is aligned perpendicular to the surface normal, which reduces sensitivity to arbitrary rotations.

Input features

ParaSurf encodes 22 features per voxel, grouped into three categories:

Chemical features (18 channels): Nine atom type classes (C, N, O, S, etc.), hybridization state, valence metrics, partial charge, and SMARTS-based descriptors for hydrophobicity, aromaticity, hydrogen bond donor/acceptor properties, and ring membership.

Electrostatic features (4 channels): Force field values from AMBER and CHARMM, plus atomic radii calculated via PDB2PQR.

Geometric features: Van der Waals surface representation with outward-pointing normal vectors.

Neural network architecture

The extracted features pass through a hybrid architecture:

3D CNN blocks extract initial volumetric features
Four residual layers with dilated convolutions capture multi-scale spatial patterns
Compression layer reduces 2048 channels to 256
Transformer block applies self-attention to model long-range dependencies between surface points
Global average pooling and a fully connected layer produce a binary binding score (0–1) for each surface point

From surface points to residues

Individual surface point predictions are aggregated to residue-level scores using the maximum:

\text{Res}_{\text{score}} = \max(P(sp_1), P(sp_2), \ldots, P(sp_n))

where $P(sp_i)$ is the predicted binding probability for surface point $i$ belonging to that residue. Residues with scores above 0.5 are classified as binding sites.

Inputs & settings

Antibody structure

Upload a PDB file containing your antibody structure, or fetch one directly from the RCSB PDB by entering the 4-letter code. The structure should include the Fab region (variable and constant domains of heavy and light chains).

Before running ParaSurf, ensure your structure is properly prepared. Use PDB Fixer to add missing atoms, fix non-standard residues, or remove water molecules and ions that may interfere with surface generation.

Model selection

ParaSurf offers four model variants trained on different benchmark datasets:

Paragraph Expanded: Trained on 1,086 antibody-antigen complexes. We recommend this for general use—it has the largest and most diverse training set.
PECAN: Trained on 460 complexes from the PECAN benchmark. Use this if you want predictions consistent with PECAN-based literature.
Paragraph - Heavy Chains Only: Predicts binding sites only on the heavy chain. Useful when you're specifically interested in heavy chain contributions.
Paragraph - Light Chains Only: Predicts binding sites only on the light chain.

Surface mesh density

Controls the resolution of the molecular surface sampling. Lower values (toward 0.1) generate a coarser mesh with fewer surface points, resulting in faster predictions but less detail. Higher values (toward 1.0) produce a denser mesh with more precise predictions at the cost of longer computation time.

For most antibody structures, the default value provides a good balance between accuracy and speed. Increase the density for detailed analysis of specific binding interfaces.

Understanding the results

ParaSurf outputs a PDB file with predicted binding scores encoded in the B-factor column. The interactive viewer colors residues by their predicted binding probability:

High scores (red/orange): Strong prediction of paratope involvement—these residues are likely to contact the antigen
Low scores (blue/white): Unlikely to be part of the binding interface

Evaluation regions

The model predicts across three antibody regions with varying accuracy:

Region	Description	Typical performance
CDR ± 2	Complementarity-determining regions plus 2 flanking residues	Highest accuracy
Fv	Full variable region (all CDRs and framework regions)	High accuracy
Fab	Entire antigen-binding fragment	Good accuracy

Predictions in the CDR regions, especially CDR-H3 (AUC-ROC ~0.96), tend to be the most reliable since these loops are directly responsible for antigen recognition.

Interpreting confidence

A residue score of 0.7 indicates the model is fairly confident that residue participates in antigen binding. Scores near 0.5 represent uncertainty—these residues may warrant experimental validation.

For therapeutic antibody development, focus on residues with scores above 0.6 as candidates for mutagenesis studies or epitope mapping experiments.

Use cases

ParaSurf is particularly valuable in several scenarios:

Therapeutic antibody engineering: Identify which residues to preserve or modify when humanizing or affinity-maturing antibodies
Epitope mapping: Predict the antibody side of the interface before conducting experimental mapping
Antibody selection: Screen computationally predicted antibody structures for favorable binding site characteristics
Structural biology: Guide experimental design by predicting which regions are most likely to form the binding interface

Limitations

ParaSurf requires the antibody structure to include the Fab region for accurate predictions. Predictions on isolated Fv fragments or single-domain antibodies (nanobodies) may be less reliable.

The model was trained on conventional antibody-antigen complexes. Performance on unusual binding modes (e.g., antibodies that bind through framework regions) has not been extensively validated.

Surface generation requires atomic coordinates—ParaSurf cannot process sequence-only inputs. Generate a structure first using a structure prediction tool if you only have the sequence.

PDB Fixer — Prepare antibody structures before analysis
PDB Viewer — Visualize structures and binding site predictions
ColabDock — Dock antibody-antigen complexes
LightDock — Protein-protein docking simulations
ESMFold, Chai-1, Boltz-2 — Predict antibody structure from sequence

Based on: Papadopoulos AM, et al. (2025). ParaSurf: A Surface-Based Deep Learning Approach for Paratope-Antigen Interaction Prediction. Bioinformatics. DOI: 10.1093/bioinformatics/btaf062

ParaSurf

Input

Prediction parameters

Output

What is ParaSurf?

How does ParaSurf work?

Surface representation

Input features

Neural network architecture

From surface points to residues

Inputs & settings

Antibody structure

Model selection

Surface mesh density

Understanding the results

Evaluation regions

Interpreting confidence

Use cases

Limitations

Related tools

Input

Prediction parameters

Output

What is ParaSurf?

How does ParaSurf work?

Surface representation

Input features

Neural network architecture

From surface points to residues

Inputs & settings

Antibody structure

Model selection

Surface mesh density

Understanding the results

Evaluation regions

Interpreting confidence

Use cases

Limitations

Related tools