AntiFold

Design antibody sequences from structure with AI-powered inverse folding

Input

Job name

Antibody Structure

Click or drag files to upload (.pdb, .cif, .ent)

30 credits

Output

Configure input settings, then click "Submit"

What is AntiFold?

AntiFold is an inverse folding model specialized for antibody variable domains. Given an antibody structure, it predicts amino acid sequences that would fold into that structure—the reverse of traditional structure prediction.

Developed by the Oxford Protein Informatics Group, AntiFold builds on Meta's ESM-IF1 model but is fine-tuned specifically on antibody structures from SAbDab (Structural Antibody Database) and predicted structures from OAS (Observed Antibody Space). This antibody-specific training dramatically improves sequence recovery for antibody CDR loops compared to general-purpose inverse folding tools.

The model uses IMGT numbering, the international standard for immunoglobulin sequences, enabling precise targeting of specific antibody regions during design.

How does AntiFold work?

AntiFold takes a 3D antibody structure and predicts the probability distribution of amino acids at each position. The model learns structural constraints—which residues are compatible with a given backbone geometry—from millions of antibody structures.

During fine-tuning from ESM-IF1, several strategies improved CDR3 recovery: span masking to learn regional context, weighted masking that emphasizes CDR residues (3:1 ratio over frameworks), layer-wise learning rate decay, and augmentation with OAS predicted structures. These modifications increased CDRH3 amino acid recovery from 43% (ESM-IF1 baseline) to 60%.

For each position, the model outputs:

Log-likelihood scores: Probabilities for each of the 20 amino acids
Perplexity: A measure of structural tolerance to mutations at that position (higher values indicate more acceptable substitutions)

From these probabilities, AntiFold samples sequences using temperature-controlled multinomial sampling. Lower temperatures produce conservative sequences closer to the probability maximum; higher temperatures explore more diverse sequence space.

How to use AntiFold online

ProteinIQ provides GPU-accelerated AntiFold without installation. The tool automatically handles IMGT renumbering via ANARCI.

Input

Input	Description
`Antibody Structure`	PDB or mmCIF file containing the antibody variable domains. Structures can be uploaded directly or fetched by PDB ID.

The structure should contain paired heavy and light chains (VH/VL) or a single-domain antibody (VHH/nanobody). An optional antigen chain may be included but should be kept small for optimal performance.

Settings

Core settings

Setting	Description
`Heavy chain ID`	Chain identifier for the heavy chain (default: `H`).
`Light chain ID`	Chain identifier for the light chain (default: `L`).
`Number of sequences`	Sequence variants to generate per structure (1–100, default: 10).
`Sampling temperature`	Controls sequence diversity (0.0–1.5, default: 0.2). Values 0.1–0.3 produce conservative designs; 0.7–1.5 explores diverse sequence space.

Region selection

Setting	Description
`IMGT regions to design`	Target specific regions: `All regions`, `CDRs only`, individual CDRs (`CDR1`, `CDR2`, `CDR3`), or `Frameworks only`.

Complementarity-determining regions (CDRs) form the antigen binding site. CDR1, CDR2, and CDR3 loops on both heavy and light chains determine specificity. Framework regions provide structural scaffolding.

Advanced options

Setting	Description
`Random seed`	For reproducible results (default: 42).
`Include per-residue scores`	Output log-likelihoods and perplexity at each position.

Output

Results include designed sequences in FASTA format with associated metrics:

Column	Description
`score`	Average log-likelihood over the designed region
`global_score`	Average log-likelihood over all residues
`seq_recovery`	Fraction of positions matching the original sequence

When per-residue scoring is enabled, a CSV file provides position-by-position analysis including perplexity values and individual amino acid log-likelihoods.

Interpreting perplexity

Perplexity reflects how many amino acids are structurally compatible at a position:

Perplexity	Interpretation
1–3	Highly constrained; few substitutions tolerated
4–8	Moderately flexible; several alternatives possible
10–14	Structurally tolerant; typical for exposed CDR positions
> 14	Very permissive; likely surface-exposed or disordered

Lower perplexity at a position suggests that the backbone geometry strongly constrains the amino acid identity—useful for identifying structurally critical residues.

Applications

CDR optimization: Redesign binding loops while preserving framework stability
Humanization: Generate human-compatible sequences for therapeutic antibodies
Affinity maturation: Explore sequence variants that maintain structural integrity
Developability improvement: Design out problematic residues while keeping the fold
Library design: Generate diverse yet structurally sound sequences for screening

AntiFold vs ProteinMPNN for antibodies

ProteinMPNN is a general-purpose inverse folding model trained on diverse protein structures. While powerful, it occasionally produces artifacts problematic for antibodies: chain reordering, gaps in IMGT-numbered structures, and suboptimal CDR3 predictions.

AntiFold addresses these issues with antibody-specific training. On experimental structures, it achieves 60% CDRH3 amino acid recovery versus 43% for ESM-IF1 and 56% for the antibody-adapted AbMPNN. For the notoriously variable CDRH3 loop, AntiFold constrains perplexity to 4–8 amino acids (versus 6–10 for AbMPNN), producing more focused designs that better preserve backbone geometry.

Limitations

Requires IMGT numbering (applied automatically, but unusual numbering may cause issues)
Optimized for variable domains (IMGT positions 1–128); not designed for constant regions
Large antigens may slow processing; keep antigen chains minimal when possible
Does not model backbone flexibility—generated sequences assume the input backbone is fixed

ESM-IF1: General-purpose inverse folding model (AntiFold's base model)
ProteinMPNN: Alternative inverse folding with different architecture
IgDesign: CDR sequence design for antibody-antigen complexes
BioPhi: Antibody humanization and humanness scoring
AbLang-2: Antibody language model for sequence analysis

AntiFold

Input

Core settings

Region selection

Advanced options

Output

What is AntiFold?

How does AntiFold work?

How to use AntiFold online

Input

Settings

Core settings

Region selection

Advanced options

Output

Interpreting perplexity

Applications

AntiFold vs ProteinMPNN for antibodies

Limitations

Related tools

Input

Core settings

Region selection

Advanced options

Output