AbLang

Restore missing residues in antibody heavy or light chains using sequence language models.

5
Configure input settings on the left, then click "Submit"

Related tools

ScanNet

ScanNet

Geometric deep learning model for predicting protein binding sites directly from 3D structure. Identifies where proteins interact with other proteins, antibodies, or disordered proteins with high accuracy, including for novel protein folds.

IPC 2.0 (isoelectric point calculator)

IPC 2.0 (isoelectric point calculator)

Isoelectric Point Calculator 2.0 - Predict protein/peptide isoelectric point (pI) using 18+ validated pKa scales, SVR models, and deep learning. Supports proteins, peptides, and comprehensive analysis.

ORF Finder

ORF Finder

Find all Open Reading Frames (ORFs) in DNA sequences. Searches all six reading frames and supports multiple genetic codes.

AllMetal3D

AllMetal3D

Predict metal and water binding sites in protein structures using 3D convolutional neural networks (AllMetal3D + Water3D).

ThermoMPNN

ThermoMPNN

Predict protein thermostability changes (ΔΔG) for point mutations using a graph neural network. Enables computational saturation mutagenesis screening to identify stabilizing mutations.

Aggrescan3D

Aggrescan3D

Faithful static-mode Aggrescan3D wrapper for per-residue aggregation propensity analysis from a single protein structure.

Protein charge plot

Protein charge plot

Plot net charge vs pH for protein sequences. Visualize how protein charge changes across pH 0-14 and identify the isoelectric point (pI) where the net charge crosses zero.

Chou-Fasman

Chou-Fasman

Predict protein secondary structure using the classic Chou-Fasman algorithm based on amino acid propensities

DockQ

DockQ

Assess docking model quality by comparing predicted complexes against native references. DockQ v2.1.3 supports protein, nucleic-acid, and supported small-molecule interfaces with faithful upstream metrics.

DSSP

DSSP

Assign protein secondary structure using the DSSP algorithm. The gold standard for hydrogen bond-based structure assignment from coordinates.

What is AbLang?

AbLang is an antibody-specific language model developed at the Oxford Protein Informatics Group. It restores missing residues in antibody sequences—a common problem in B-cell receptor repertoire sequencing where over 40% of sequences in the Observed Antibody Space (OAS) database are missing their first 15 N-terminal amino acids.

The model uses a RoBERTa transformer architecture trained exclusively on antibody sequences from OAS. This specialization allows AbLang to outperform general protein language models like ESM-1b while matching the accuracy of IMGT germline-based restoration—without requiring any germline knowledge.

How does AbLang work?

AbLang learns antibody-specific patterns through masked language modeling. During training, 1–25% of residues in each sequence are masked, and the model predicts the original amino acids from context. This training approach captures the statistical regularities of antibody sequences, enabling accurate restoration of missing positions.

Two separate models were trained:

  • Heavy chain model: 14.1 million sequences, 20 epochs
  • Light chain model: 187,000 sequences, 40 epochs

Each model consists of two components: AbRep generates 768-dimensional embeddings from sequence context, and AbHead predicts amino acid likelihoods at each position. For restoration, the amino acid with highest likelihood at each masked position is selected.

Performance

On N-terminal restoration (first 15 positions):

ChainAbLang accuracyIMGT germline accuracyESM-1b accuracy
Heavy~98%~98%64%
Light~96%~96%54%

AbLang is also 7× faster than ESM-1b, processing 100 sequences in about 6.5 seconds versus 45 seconds.

How to use AbLang online?

Input

FieldDescription
Antibody sequencesFASTA format. Mark missing residues with asterisks (*).

Missing residues can appear anywhere in the sequence, though the most common use case is N-terminal restoration. For example:

1>heavy_chain_example2EVQLVESGGGLVQP**SLRLSCAASGFTF**SYAMSWVRQAPGKGLEWVSAI

Settings

SettingDescription
Chain typeHeavy chain or Light chain. AbLang uses separate models optimized for each chain type. Heavy chains typically begin with EVQ or QVQ; light chains with DIQ or EIV.

Output

The restored sequence with predicted amino acids replacing each asterisk. Results display the original sequence alongside the restored version, highlighting the restored residues.

Limitations

AbLang works best for sequences that resemble those in its training data—antibody variable domains from the OAS database. Performance may degrade on highly unusual antibodies or non-human sequences not well-represented in OAS.

For sequences with unknown numbers of missing N-terminal residues (rather than known positions marked with asterisks), alignment-based restoration can be performed, though this requires additional preprocessing.