Humatch

AI-powered antibody humanization with optimal V-gene matching

Input

Job name

Heavy Chain (VH)

Light Chain (VL)

10 credits

Output

Configure input settings, then click "Run"

What is Humatch?

Humatch is a computational antibody humanization tool that transforms non-human (typically murine) antibody variable regions into sequences that resemble human antibodies. Reducing the "foreignness" of therapeutic antibodies is critical for avoiding anti-drug antibody (ADA) responses in patients, but humanization must preserve the original binding specificity encoded in the CDR loops.

What distinguishes Humatch from earlier approaches is that it humanizes heavy and light chains jointly. Most humanization methods treat VH and VL independently, but the pairing of heavy and light chains affects expression, stability, and whether immunogenic epitopes form across the VH/VL interface. Humatch uses three lightweight convolutional neural networks (CNNs) trained on millions of antibody sequences from the Observed Antibody Space (OAS) to guide mutations toward a specific target human V-gene while simultaneously optimizing VH/VL pairing compatibility.

Humatch was developed by Lewis Chinery, Jeliazko R. Jeliazkov, and Charlotte M. Deane at the University of Oxford and GSK R&D.

How does Humatch work?

Three CNN classifiers

Humatch trains three CNNs, each with 40 convolutional filters (kernel size 10, stride 1) operating on Kidera factor encodings of aligned antibody sequences:

CNN-H: Classifies heavy chains into 8 classes (non-human + 7 heavy V-gene families: HV1-7)
CNN-L: Classifies light chains into 18 classes (non-human + 10 lambda + 7 kappa V-gene families)
CNN-P: Classifies VH/VL pairs as naturally paired vs. artificially paired

All three classifiers were trained on data from the OAS database: 8.26 million human and 3.77 million non-human heavy chains, 12.73 million human and 1.41 million non-human light chains, and 1.67 million natural pairs plus 5.01 million artificially mis-paired sequences.

Input sequences are aligned to 200 IMGT-numbered positions using ANARCI, with missing positions filled by gap tokens. For the paired classifier, heavy and light chains are concatenated with a 10-residue pad separator, yielding a 410-position input.

Iterative humanization algorithm

Humanization proceeds in two phases.

Phase 1: Germline-likeness matching. Before engaging the CNNs, the algorithm computes a germline-likeness (GL) score for each chain. At every IMGT position, precomputed amino acid frequency tables for the target V-gene define how "germline-like" each residue is. The mean frequency across all positions gives the GL score. Mutations that maximize GL increase are applied iteratively until the GL score reaches the target threshold (default: 0.40). This initial step places the sequence on a sensible humanization trajectory without requiring expensive CNN inference.

Phase 2: CNN-guided mutation selection. The algorithm then generates all possible single-point variants at non-CDR positions, scores each with all three CNNs, and selects the mutation that best improves CNN scores toward their targets. The selection formula accounts for:

Net change in CNN prediction relative to the unmutated sequence
Germline-specific amino acid frequency scaling at that position
Distance of each CNN score from its target threshold (scores already above target contribute zero weight)
Combined heavy, light, and paired objectives through element-wise score addition

The process repeats until all three CNN scores reach their target thresholds or the maximum number of mutations is exhausted.

Why Kidera factors?

Rather than one-hot amino acid encodings (which allowed spurious mutations in testing) or protein language model embeddings (which would bloat model size), Humatch uses 10-dimensional Kidera factor vectors that capture physicochemical properties of each amino acid. Combined with early stopping during training, this produces classifiers that generalize smoothly across sequence space rather than memorizing sharp decision boundaries.

How to use Humatch online

ProteinIQ provides browser-based access to Humatch, running the full humanization pipeline on cloud infrastructure without requiring Python, TensorFlow, or ANARCI installation.

Inputs

Input	Description
`Heavy Chain (VH)`	Antibody heavy chain variable region sequence. Raw amino acid sequence or FASTA format. Typically ~120 residues.
`Light Chain (VL)`	Antibody light chain variable region sequence. Raw amino acid sequence or FASTA format. Typically ~110 residues.

Both chains are required. Sequences must contain only standard amino acids (20 canonical residues) and must be recognizable as antibody variable domains for IMGT numbering to succeed.

Settings

Humanization options

Setting	Description
`Minimum humanness score`	CNN score threshold for accepting a humanized sequence (0.5-0.95, default 0.7). Higher values produce more human-like sequences but may require more mutations.
`Maximum edits per chain`	Upper bound on amino acid substitutions per chain (5-50, default 20). Lower values preserve more of the original sequence.
`Preserve CDR regions`	When enabled (default), CDR residues are excluded from mutation candidates to maintain antigen-binding specificity.

Output options

Setting	Description
`Include sequence alignment`	Show alignment between original and humanized sequences.
`Include V-gene details`	Report the predicted target human V-gene (IGHV/IGLV/IGKV) for each chain.
`Output format`	Download format: `CSV` (default), `TSV`, or `JSON`.

Results

The output table summarizes the humanization outcome for each chain:

Property	Description
Original sequence	The input VH/VL sequence before humanization
Humanized sequence	The modified sequence with framework mutations applied
Predicted V-gene	The human V-gene family the CNN targets (e.g., HV3, KV1)
Humanness score	CNN probability that the sequence belongs to the predicted human V-gene class (0-1)
Paired score	CNN-P probability that the VH/VL pair resembles a naturally occurring human pair
Edit count	Number of amino acid substitutions relative to the input

Interpreting scores

Score	Range	Interpretation
Humanness (CNN-H/L)	> 0.95	High confidence the sequence resembles the target human V-gene
Humanness (CNN-H/L)	0.7 - 0.95	Moderately human-like; may benefit from additional framework optimization
Humanness (CNN-H/L)	< 0.7	Substantially non-human character remains
Paired (CNN-P)	> 0.5	Pairing resembles natural human VH/VL combinations
Paired (CNN-P)	< 0.5	Pairing may have stability or immunogenicity concerns

High CNN-P scores correlate with higher melting temperatures in therapeutic antibodies, suggesting paired optimization contributes to developability beyond just immunogenicity.

Limitations

CDR preservation is a tradeoff: Excluding CDR residues from humanization protects binding affinity but may leave immunogenic non-human residues in CDR frameworks. Manual inspection of CDR-adjacent mutations is still recommended.
V-gene coverage: Performance varies by V-gene family. Classes with sparse training data (e.g., KV7 with ~4,000 sequences) have lower classification accuracy than well-represented families (HV3, KV1 with millions of sequences).
No structural modeling: Humatch operates purely on sequence. It does not verify that mutations are structurally compatible. Pairing with structure prediction tools like ImmuneBuilder can help assess structural impact.
Single-pair input: Each run humanizes one VH/VL pair. For batch processing of antibody panels, CSV input through the command-line tool may be more practical.

BioPhi: Alternative antibody humanization using Sapiens (deep learning) and OASis (peptide-based humanness scoring), also trained on OAS data
ANARCI: Antibody numbering and V-gene annotation using IMGT, Chothia, or Kabat schemes (used internally by Humatch for sequence alignment)
IgBLAST: V/D/J gene segment identification and CDR delineation from NCBI
AbLang-2: Antibody language model for predicting non-germline residues and generating sequence embeddings
ImmuneBuilder: 3D structure prediction for antibodies, nanobodies, and T-cell receptors
DiffAb: Diffusion model-based CDR sequence and structure design for antigen-specific antibodies

Humatch

Input

Humanization options

Output options

Output

What is Humatch?

How does Humatch work?

Three CNN classifiers

Iterative humanization algorithm

Why Kidera factors?

How to use Humatch online

Inputs

Settings

Humanization options

Output options

Results

Interpreting scores

Limitations

Related tools

Input

Humanization options

Output options

Output

What is Humatch?

How does Humatch work?

Three CNN classifiers

Iterative humanization algorithm

Why Kidera factors?

How to use Humatch online

Inputs

Settings

Humanization options

Output options

Results

Interpreting scores

Limitations

Related tools