Input
Output
Heavy chain with missing residues
Light chain with missing residues

Geometric deep learning model for predicting protein binding sites directly from 3D structure. Identifies where proteins interact with other proteins, antibodies, or disordered proteins with high accuracy, including for novel protein folds.

Isoelectric Point Calculator 2.0 - Predict protein/peptide isoelectric point (pI) using 18+ validated pKa scales, SVR models, and deep learning. Supports proteins, peptides, and comprehensive analysis.

Find all Open Reading Frames (ORFs) in DNA sequences. Searches all six reading frames and supports multiple genetic codes.

Predict protein solubility from amino acid sequence using the University of Manchester Protein-Sol method.

Predict metal and water binding sites in protein structures using 3D convolutional neural networks (AllMetal3D + Water3D).

Predict protein thermostability changes (ΔΔG) for point mutations using a graph neural network. Enables computational saturation mutagenesis screening to identify stabilizing mutations.

Faithful static-mode Aggrescan3D tool for per-residue aggregation propensity analysis from a single protein structure.

Carbon is a DNA language model for generation, scoring, and sequence comparison using the native Hugging Face Carbon model family.

Plot net charge vs pH for protein sequences. Visualize how protein charge changes across pH 0-14 and identify the isoelectric point (pI) where the net charge crosses zero.

Predict protein secondary structure using the classic Chou-Fasman algorithm based on amino acid propensities
AbLang is an antibody-specific language model developed at the Oxford Protein Informatics Group. It restores missing residues in antibody sequences—a common problem in B-cell receptor repertoire sequencing where over 40% of sequences in the Observed Antibody Space (OAS) database are missing their first 15 N-terminal amino acids.
The model uses a RoBERTa transformer architecture trained exclusively on antibody sequences from OAS. This specialization allows AbLang to outperform general protein language models like ESM-1b while matching the accuracy of IMGT germline-based restoration—without requiring any germline knowledge.
AbLang learns antibody-specific patterns through masked language modeling. During training, 1–25% of residues in each sequence are masked, and the model predicts the original amino acids from context. This training approach captures the statistical regularities of antibody sequences, enabling accurate restoration of missing positions.
Two separate models were trained:
Each model consists of two components: AbRep generates 768-dimensional embeddings from sequence context, and AbHead predicts amino acid likelihoods at each position. For restoration, the amino acid with highest likelihood at each masked position is selected.
On N-terminal restoration (first 15 positions):
| Chain | AbLang accuracy | IMGT germline accuracy | ESM-1b accuracy |
|---|---|---|---|
| Heavy | ~98% | ~98% | 64% |
| Light | ~96% | ~96% | 54% |
AbLang is also 7× faster than ESM-1b, processing 100 sequences in about 6.5 seconds versus 45 seconds.
| Field | Description |
|---|---|
Antibody sequences | FASTA format. Mark missing residues with asterisks (*). |
Missing residues can appear anywhere in the sequence, though the most common use case is N-terminal restoration. For example:
>heavy_chain_example
EVQLVESGGGLVQP**SLRLSCAASGFTF**SYAMSWVRQAPGKGLEWVSAI| Setting | Description |
|---|---|
Chain type | Heavy chain or Light chain. AbLang uses separate models optimized for each chain type. Heavy chains typically begin with EVQ or QVQ; light chains with DIQ or EIV. |
The restored sequence with predicted amino acids replacing each asterisk. Results display the original sequence alongside the restored version, highlighting the restored residues.
AbLang works best for sequences that resemble those in its training data—antibody variable domains from the OAS database. Performance may degrade on highly unusual antibodies or non-human sequences not well-represented in OAS.
For sequences with unknown numbers of missing N-terminal residues (rather than known positions marked with asterisks), alignment-based restoration can be performed, though this requires additional preprocessing.