IgBLAST

Analyze antibody and T cell receptor variable domain sequences

Docs

Input

Antibody/TCR sequences

5 credits

Output

Configure input settings on the left, then click "Run IgBLAST"

What is IgBLAST?

IgBLAST (Immunoglobulin BLAST) is a specialized sequence analysis tool designed to analyze immunoglobulin (antibody) and T cell receptor variable domain sequences. Developed by researchers at the National Center for Biotechnology Information (NCBI), IgBLAST automates the identification of V (Variable), D (Diversity), and J (Joining) gene segments, delineates complementarity determining regions (CDRs), and analyzes rearrangement junctions in antigen receptor sequences.

The tool was published in 2013 in Nucleic Acids Research by Jian Ye, Ning Ma, Thomas L. Madden, and James M. Ostell to address limitations of general-purpose sequence alignment tools like standard BLAST. Unlike conventional BLAST, which cannot efficiently handle the rearranged nature and variable segment lengths characteristic of immunoglobulin sequences, IgBLAST performs specialized sequential searches optimized for antibody and T cell receptor analysis.

The most common applications include

Therapeutic antibody discovery: Identifying naturally occurring antibodies with desired specificities from immune repertoires for drug development. Companies developing antibody therapeutics against cancer, infectious diseases, and autoimmune conditions rely on V/D/J analysis to characterize candidate antibodies.
Vaccine response monitoring: Tracking antibody gene usage patterns following vaccination. Analyzing V gene family bias and CDR3 sequences reveals immune response dynamics and identifies vaccine-elicited antibody lineages.
B cell repertoire sequencing: Analyzing millions of antibody sequences from high-throughput sequencing to map the complete antibody repertoire. IgBLAST enables large-scale annotation of next-generation sequencing data for repertoire studies.
Clonal lineage tracing: Reconstructing B cell evolutionary histories by identifying antibodies sharing common V(D)J rearrangements. Antibodies with identical V/D/J gene usage and similar CDR3 sequences descend from a common ancestor.
Autoimmune disease research: Characterizing autoreactive antibody repertoires in diseases like systemic lupus erythematosus and rheumatoid arthritis. Certain V gene families show biased usage in autoimmune conditions.
Cancer immunotherapy: Analyzing tumor-infiltrating B cells and identifying tumor-reactive antibodies. V/D/J profiling reveals clonal expansions associated with anti-tumor immune responses.
Antibody engineering: Guiding humanization of mouse antibodies and optimization of therapeutic candidates. Understanding V gene identity and CDR3 composition informs rational antibody design strategies.

How to use IgBLAST online

ProteinIQ provides a web-based interface for running IgBLAST without command-line installation or local database setup. Upload antibody or TCR sequences in FASTA format, configure organism and receptor settings, and receive comprehensive V/D/J gene assignments with CDR region annotations in a sortable spreadsheet format.

Inputs

Input	Description
`Antibody/TCR sequences`	FASTA-formatted nucleotide or protein sequences from immunoglobulin or T cell receptor variable regions. Accepts batch submissions with multiple sequences. Maximum file size: 50 MB.

Settings

Sequence configuration

Setting	Description
`Organism`	Species source of germline gene database. Options: `Human` (default), `Mouse`, `Rabbit`, `Rat`, `Rhesus monkey`. Selects appropriate V/D/J gene reference libraries.
`Receptor type`	Antigen receptor class. `Ig` (immunoglobulin/antibody, default) or `TCR` (T cell receptor). Determines which gene segment databases are searched.
`Sequence type`	Input sequence format. `Auto-detect` (default) determines whether sequences are nucleotide or protein based on character composition. `Nucleotide` (DNA/RNA) provides complete V/D/J analysis. `Protein` (amino acid) analyzes V gene only.

Germline database

Setting	Description
`Database source`	Reference germline gene library. `IMGT` (default, recommended) provides standardized international nomenclature. `NCBI` includes additional species coverage.
`Domain system`	CDR numbering scheme. `IMGT` (default) is the international standard for antibody analysis. `Kabat` is an alternative legacy numbering system.

Output options

Setting	Description
`Max alignments per gene`	Number of top-scoring V/D/J gene matches to report per sequence (1–10, default 3). Higher values reveal alternative gene assignments when sequences match multiple alleles.
`Show translation`	Include amino acid translation for nucleotide input sequences (default enabled).

Results

The output consists of a spreadsheet with one row per analyzed sequence, showing gene segment assignments, identity scores, CDR3 sequences, and productivity status.

Column	Description
`Query ID`	Sequence identifier from FASTA header.
`Chain`	Antibody or TCR chain classification. `VH` (heavy chain), `VL-KAPPA` (kappa light chain), `VL-LAMBDA` (lambda light chain), or TCR chains (`VA`, `VB`, `VG`, `VD`).
`V Gene`	Variable gene segment assignment with allele designation (e.g., `IGHV3-23*01`). Multiple matches separated by commas indicate ambiguous assignments.
`V Identity %`	Percentage sequence identity to assigned V gene germline sequence. Reflects degree of somatic hypermutation.
`D Gene`	Diversity gene segment assignment (heavy chains and TCR beta/delta only). Light chains show `N/A` as they lack D segments.
`J Gene`	Joining gene segment assignment.
`CDR3 (AA)`	Complementarity determining region 3 amino acid sequence. The most variable antibody region responsible for antigen specificity.
`CDR3 (NT)`	CDR3 nucleotide sequence.
`Junction`	Complete nucleotide sequence spanning the V-D-J junction, including trimmed germline bases and non-templated insertions.
`Productive`	Rearrangement functionality. `Yes` indicates in-frame junction without premature stop codons. `No` indicates non-functional rearrangement.

Interpreting V identity scores

V gene identity percentages reveal the extent of somatic hypermutation:

95–100%: Minimal mutation, characteristic of naive B cells or germline-like antibodies
85–95%: Moderate mutation, typical of early immune responses or IgM antibodies
<85%: Extensive mutation, indicative of affinity-matured antibodies from memory B cells

Lower identity scores suggest prolonged antigen-driven selection and affinity maturation.

Understanding CDR3 sequences

CDR3 represents the antibody's primary antigen contact region and exhibits the highest sequence diversity due to junctional diversity during V(D)J recombination. The CDR3 amino acid sequence serves as a molecular fingerprint for clonal B cell populations. Identical CDR3 sequences across multiple antibodies indicate they likely originated from the same B cell ancestor.

Interpreting D gene assignments

D (Diversity) genes are particularly short (10–30 nucleotides) and undergo extensive nucleotide deletions and additions during recombination. Multiple D gene matches (e.g., IGHD3-10*01,IGHD3-10*02,IGHD3-16*01) indicate ambiguous assignments due to:

Junctional modifications: Terminal nucleotides are trimmed during recombination
Non-templated additions: Random nucleotides inserted at junctions
Allelic similarity: Different alleles or genes share highly similar sequences

When multiple D genes are reported, the first listed represents the top-scoring match.

How does IgBLAST work?

IgBLAST employs a specialized multi-stage BLAST search strategy with optimized parameters for each gene segment type, combined with biological constraint enforcement and automated CDR/framework region annotation.

Sequential gene segment identification

Unlike standard BLAST which performs a single search, IgBLAST executes three sequential searches with segment-specific parameters:

V gene search: Word size 9, mismatch penalty −1, identifying the ~290-base variable region
J gene search: Word size 7, mismatch penalty −3, expect cutoff 1000 to accommodate short J segments
D gene search: Adjustable word size (default 5), mismatch penalty −4 to detect highly modified diversity regions

After identifying the top V gene match, that region is masked before searching for D and J genes, preventing spurious alignments to the already-identified V segment.

Biological constraint enforcement

IgBLAST applies immunological rules during analysis:

Positional constraints: D genes must lie between V and J genes in the rearranged sequence
Locus specificity: All segments (V, D, J) must originate from the same immunoglobulin locus (IGH, IGK, or IGL for antibodies; TRA, TRB, TRG, or TRD for TCRs)
Chain compatibility: Heavy chains require V-D-J rearrangement while light chains use V-J only

CDR and framework region delineation

The tool maps complementarity determining regions (CDRs) and framework regions (FRs) by:

Identifying the top V gene germline match
Transferring pre-annotated FR/CDR boundary positions from the germline database
Aligning these boundaries to the query sequence accounting for insertions/deletions

This automated annotation eliminates manual boundary identification required when using standard sequence alignment tools.

Germline database architecture

IgBLAST searches against curated germline gene databases containing:

V gene databases: Complete sets of functional and pseudogene variable segments with FR/CDR annotations
D gene databases: Diversity gene segments (IGH, TRB, TRD loci only)
J gene databases: Joining gene segments for all loci

The IMGT database provides standardized international nomenclature, while NCBI databases offer broader species coverage. Both databases undergo regular updates as new germline gene sequences are characterized.

V(D)J recombination background

Understanding V(D)J recombination is essential for interpreting IgBLAST results. During B cell and T cell development, developing lymphocytes undergo V(D)J recombination, a somatic DNA rearrangement process that generates antigen receptor diversity.

Gene segment organization

Immunoglobulin and TCR genes are organized as arrays of multiple gene segments in germline DNA:

Heavy chains: Separate V, D, and J gene segment clusters
Light chains: V and J gene segment clusters (no D segments)

Human immunoglobulin heavy chain loci contain approximately 50 functional V genes, 23 D genes, and 6 J genes, enabling combinatorial diversity.

Recombination process

During lymphocyte development:

Random selection: One V, one D (heavy chains only), and one J segment are randomly chosen
DNA excision: Intervening DNA sequences between selected segments are permanently deleted
Segment joining: Selected segments are fused together with nucleotide modifications at junctions

This process creates a continuous coding sequence for the antibody variable region.

Junctional diversity mechanisms

Additional diversity is generated at V-D-J junctions through:

Exonuclease trimming: Terminal nucleotides are removed from gene segment ends
N-nucleotide addition: Terminal deoxynucleotidyl transferase (TdT) adds random nucleotides at junctions
P-nucleotide addition: Palindromic sequences created during DNA hairpin resolution

These modifications create the hypervariable CDR3 region, which IgBLAST identifies in the junction analysis.

Somatic hypermutation

After initial V(D)J recombination, activated B cells undergo somatic hypermutation in germinal centers, introducing point mutations into V region genes. This process enables affinity maturation—progressive improvement of antibody binding affinity through iterative mutation and antigen-driven selection. The V gene identity percentage reported by IgBLAST directly reflects the extent of somatic hypermutation.

Limitations

Rigid germline reference requirement: Analysis accuracy depends on comprehensive germline gene databases. Novel or poorly characterized species may lack complete reference libraries.
D gene assignment ambiguity: Extensive junctional modifications often prevent definitive D gene identification, particularly in highly mutated sequences. Multiple equally plausible D gene assignments are common.
Protein sequence limitations: Protein input sequences receive V gene assignment only. D and J gene identification, junction analysis, and productivity determination require nucleotide sequences.
Chimeric sequences: Sequences containing artifacts from PCR recombination between different templates may produce biologically implausible gene assignments.
Non-standard species: Official support is limited to human, mouse, rabbit, rat, and rhesus monkey. Other species require custom germline database preparation.

Related tools

ANARCI

Number antibody and T cell receptor variable domain sequences using multiple numbering schemes (IMGT, Chothia, Kabat, Martin, AHo, Wolfguy). Identifies chain type, species, and assigns germline genes.

HMMER

Sensitive sequence homology search using profile hidden Markov models. More accurate than BLAST for detecting remote homologs, ideal for finding evolutionarily distant protein family members.

MAFFT

Perform multiple sequence alignment using MAFFT (Multiple Alignment using Fast Fourier Transform). Supports multiple algorithms from fast progressive to highly accurate iterative methods.

MMseqs2

Ultra-fast sequence search and clustering. 10,000x faster than BLAST for database searches, with powerful sequence clustering capabilities for proteins and nucleotides.

MUSCLE5

Perform multiple sequence alignment using MUSCLE5 (MUltiple Sequence Comparison by Log-Expectation). Uses the PPP algorithm for high-quality alignments with support for ensemble generation.

FoldSeek

Fast protein structure search, comparison, and clustering. Search your structure against 200M+ AlphaFold predictions, compare 2 structures, or cluster up to 2500.

USAlign

USAlign (Universal Structure Alignment) aligns protein, RNA, and DNA structures to compute TM-scores and generate superposed structures. Compare 3D structures to assess structural similarity.

MUMmer4

Rapidly align and compare DNA sequences using MUMmer4 nucmer. Perform pairwise genome comparisons to identify SNPs, indels, and structural variants between reference and query genomes.

Clustal Omega

Perform multiple sequence alignment on protein or nucleotide sequences using the Clustal Omega algorithm.

FastTree

Infer approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences.

What is IgBLAST?

The most common applications include

Therapeutic antibody discovery: Identifying naturally occurring antibodies with desired specificities from immune repertoires for drug development. Companies developing antibody therapeutics against cancer, infectious diseases, and autoimmune conditions rely on V/D/J analysis to characterize candidate antibodies.
Vaccine response monitoring: Tracking antibody gene usage patterns following vaccination. Analyzing V gene family bias and CDR3 sequences reveals immune response dynamics and identifies vaccine-elicited antibody lineages.
B cell repertoire sequencing: Analyzing millions of antibody sequences from high-throughput sequencing to map the complete antibody repertoire. IgBLAST enables large-scale annotation of next-generation sequencing data for repertoire studies.
Clonal lineage tracing: Reconstructing B cell evolutionary histories by identifying antibodies sharing common V(D)J rearrangements. Antibodies with identical V/D/J gene usage and similar CDR3 sequences descend from a common ancestor.
Autoimmune disease research: Characterizing autoreactive antibody repertoires in diseases like systemic lupus erythematosus and rheumatoid arthritis. Certain V gene families show biased usage in autoimmune conditions.
Cancer immunotherapy: Analyzing tumor-infiltrating B cells and identifying tumor-reactive antibodies. V/D/J profiling reveals clonal expansions associated with anti-tumor immune responses.
Antibody engineering: Guiding humanization of mouse antibodies and optimization of therapeutic candidates. Understanding V gene identity and CDR3 composition informs rational antibody design strategies.