Humatch

Humanize antibodies with optimal V-gene matching.

10
Configure input settings on the left, then click "Submit"orLoad an example (it's free)

Mouse antibody humanization

Related tools

IgGM

IgGM

IgGM is a generative foundation model for antibody and nanobody design against a target antigen. Supports CDR design, affinity maturation, inverse design, and framework design. Requires an antigen structure (PDB) and antibody sequences with "X" marking positions to design.

IgDesign

IgDesign

Design antibody CDR sequences via inverse folding. Generates complementarity-determining region (CDR) sequences for antibodies targeting therapeutic antigens using deep learning. Optimizes CDR loops (HCDR1, HCDR2, HCDR3) based on antibody-antigen complex structures.

mBER

mBER

Design VHH nanobody binders using AlphaFold-Multimer with structure templates and sequence conditioning. mBER (Manifold Binder Engineering and Refinement) generates novel VHH antibody sequences that bind to user-specified target proteins.

AntiFold

AntiFold

Inverse folding for antibody variable domains and nanobodies. Predicts amino acid sequences compatible with antibody structures using IMGT numbering while preserving upstream AntiFold chain handling and structural constraints.

RFantibody

RFantibody

Structure-based de novo antibody and nanobody design pipeline combining antibody-tuned RFdiffusion, ProteinMPNN sequence design, and antibody-tuned RoseTTAFold2 filtering.

BioPhi

BioPhi

Antibody humanization and humanness evaluation platform from Merck. Sapiens mode uses deep learning trained on the Observed Antibody Space (OAS) to humanize antibody sequences, while OASis mode evaluates humanness using 9-mer peptide search against human antibody databases.

BoltzGen

BoltzGen

BoltzGen is a state-of-the-art AI model for designing protein and peptide binders against any biomolecular target. Using generative diffusion models, it creates novel binders (proteins, peptides, nanobodies) with nanomolar-level binding affinity.

PepMLM

PepMLM

Design linear peptide binders for target proteins using a target sequence-conditioned masked language model. PepMLM generates peptide sequences optimized to bind specific protein targets based on ESM-2 protein language modeling.

DiffAb

DiffAb

AI-powered antibody CDR design using equivariant diffusion models. Generates optimized complementarity-determining region (CDR) sequences and structures for antibodies targeting specific antigens. Supports single CDR, multi-CDR co-design, and fixed-backbone sequence design modes.

PocketFlow

PocketFlow

PocketFlow is a structure-based molecular generative model that designs novel drug-like molecules within protein binding pockets. It uses autoregressive flow modeling with chemical knowledge to generate 100% chemically valid, highly drug-like compounds.

What is Humatch?

Humatch is a computational antibody humanization tool that transforms non-human (typically murine) antibody variable regions into sequences that resemble human antibodies. Reducing the "foreignness" of therapeutic antibodies is critical for avoiding anti-drug antibody (ADA) responses in patients, but humanization must preserve the original binding specificity encoded in the CDR loops.

What distinguishes Humatch from earlier approaches is that it humanizes heavy and light chains jointly. Most humanization methods treat VH and VL independently, but the pairing of heavy and light chains affects expression, stability, and whether immunogenic epitopes form across the VH/VL interface. Humatch uses three lightweight convolutional neural networks (CNNs) trained on millions of antibody sequences from the Observed Antibody Space (OAS) to guide mutations toward a specific target human V-gene while simultaneously optimizing VH/VL pairing compatibility.

Humatch was developed by Lewis Chinery, Jeliazko R. Jeliazkov, and Charlotte M. Deane at the University of Oxford and GSK R&D.

How does Humatch work?

Three CNN classifiers

Humatch trains three CNNs, each with 40 convolutional filters (kernel size 10, stride 1) operating on Kidera factor encodings of aligned antibody sequences:

  • CNN-H: Classifies heavy chains into 8 classes (non-human + 7 heavy V-gene families: HV1-7)
  • CNN-L: Classifies light chains into 18 classes (non-human + 10 lambda + 7 kappa V-gene families)
  • CNN-P: Classifies VH/VL pairs as naturally paired vs. artificially paired

All three classifiers were trained on data from the OAS database: 8.26 million human and 3.77 million non-human heavy chains, 12.73 million human and 1.41 million non-human light chains, and 1.67 million natural pairs plus 5.01 million artificially mis-paired sequences.

Input sequences are aligned to 200 IMGT-numbered positions using ANARCI, with missing positions filled by gap tokens. For the paired classifier, heavy and light chains are concatenated with a 10-residue pad separator, yielding a 410-position input.

Iterative humanization algorithm

Humanization proceeds in two phases.

Phase 1: Germline-likeness matching. Before engaging the CNNs, the algorithm computes a germline-likeness (GL) score for each chain. At every IMGT position, precomputed amino acid frequency tables for the target V-gene define how "germline-like" each residue is. The mean frequency across all positions gives the GL score. Mutations that maximize GL increase are applied iteratively until the GL score reaches the target threshold (default: 0.40). This initial step places the sequence on a sensible humanization trajectory without requiring expensive CNN inference.

Phase 2: CNN-guided mutation selection. The algorithm then generates all possible single-point variants at non-CDR positions, scores each with all three CNNs, and selects the mutation that best improves CNN scores toward their targets. The selection formula accounts for:

  1. Net change in CNN prediction relative to the unmutated sequence
  2. Germline-specific amino acid frequency scaling at that position
  3. Distance of each CNN score from its target threshold (scores already above target contribute zero weight)
  4. Combined heavy, light, and paired objectives through element-wise score addition

The process repeats until all three CNN scores reach their target thresholds or the maximum number of mutations is exhausted.

Why Kidera factors?

Rather than one-hot amino acid encodings (which allowed spurious mutations in testing) or protein language model embeddings (which would bloat model size), Humatch uses 10-dimensional Kidera factor vectors that capture physicochemical properties of each amino acid. Combined with early stopping during training, this produces classifiers that generalize smoothly across sequence space rather than memorizing sharp decision boundaries.

How to use Humatch online

ProteinIQ provides browser-based access to Humatch, running the full humanization pipeline on cloud infrastructure without requiring Python, TensorFlow, or ANARCI installation.

Inputs

InputDescription
Heavy Chain (VH)Antibody heavy chain variable region sequence. Raw amino acid sequence or FASTA format. Typically ~120 residues.
Light Chain (VL)Antibody light chain variable region sequence. Raw amino acid sequence or FASTA format. Typically ~110 residues.

Both chains are required. Sequences must contain only standard amino acids (20 canonical residues) and must be recognizable as antibody variable domains for IMGT numbering to succeed.

Settings

Humanization options

SettingDescription
Minimum humanness scoreCNN score threshold for accepting a humanized sequence (0.5-0.95, default 0.7). Higher values produce more human-like sequences but may require more mutations.
Maximum edits per chainUpper bound on amino acid substitutions per chain (5-50, default 20). Lower values preserve more of the original sequence.
Preserve CDR regionsWhen enabled (default), CDR residues are excluded from mutation candidates to maintain antigen-binding specificity.

Output options

SettingDescription
Include sequence alignmentShow alignment between original and humanized sequences.
Include V-gene detailsReport the predicted target human V-gene (IGHV/IGLV/IGKV) for each chain.
Output formatDownload format: CSV (default), TSV, or JSON.

Results

The output table summarizes the humanization outcome for each chain:

PropertyDescription
Original sequenceThe input VH/VL sequence before humanization
Humanized sequenceThe modified sequence with framework mutations applied
Predicted V-geneThe human V-gene family the CNN targets (e.g., HV3, KV1)
Humanness scoreCNN probability that the sequence belongs to the predicted human V-gene class (0-1)
Paired scoreCNN-P probability that the VH/VL pair resembles a naturally occurring human pair
Edit countNumber of amino acid substitutions relative to the input

Interpreting scores

ScoreRangeInterpretation
Humanness (CNN-H/L)> 0.95High confidence the sequence resembles the target human V-gene
Humanness (CNN-H/L)0.7 - 0.95Moderately human-like; may benefit from additional framework optimization
Humanness (CNN-H/L)< 0.7Substantially non-human character remains
Paired (CNN-P)> 0.5Pairing resembles natural human VH/VL combinations
Paired (CNN-P)< 0.5Pairing may have stability or immunogenicity concerns

High CNN-P scores correlate with higher melting temperatures in therapeutic antibodies, suggesting paired optimization contributes to developability beyond just immunogenicity.

Limitations

  • CDR preservation is a tradeoff: Excluding CDR residues from humanization protects binding affinity but may leave immunogenic non-human residues in CDR frameworks. Manual inspection of CDR-adjacent mutations is still recommended.
  • V-gene coverage: Performance varies by V-gene family. Classes with sparse training data (e.g., KV7 with ~4,000 sequences) have lower classification accuracy than well-represented families (HV3, KV1 with millions of sequences).
  • No structural modeling: Humatch operates purely on sequence. It does not verify that mutations are structurally compatible. Pairing with structure prediction tools like ImmuneBuilder can help assess structural impact.
  • Single-pair input: Each run humanizes one VH/VL pair. For batch processing of antibody panels, CSV input through the command-line tool may be more practical.