ProteinIQ
IgBLAST icon

IgBLAST

Analyze antibody and T cell receptor variable domain sequences

What is IgBLAST?

IgBLAST (Immunoglobulin BLAST) is a specialized sequence analysis tool designed to analyze immunoglobulin (antibody) and T cell receptor variable domain sequences. Developed by researchers at the National Center for Biotechnology Information (NCBI), IgBLAST automates the identification of V (Variable), D (Diversity), and J (Joining) gene segments, delineates complementarity determining regions (CDRs), and analyzes rearrangement junctions in antigen receptor sequences.

The tool was published in 2013 in Nucleic Acids Research by Jian Ye, Ning Ma, Thomas L. Madden, and James M. Ostell to address limitations of general-purpose sequence alignment tools like standard BLAST. Unlike conventional BLAST, which cannot efficiently handle the rearranged nature and variable segment lengths characteristic of immunoglobulin sequences, IgBLAST performs specialized sequential searches optimized for antibody and T cell receptor analysis.

The most common applications include

  • Therapeutic antibody discovery: Identifying naturally occurring antibodies with desired specificities from immune repertoires for drug development. Companies developing antibody therapeutics against cancer, infectious diseases, and autoimmune conditions rely on V/D/J analysis to characterize candidate antibodies.
  • Vaccine response monitoring: Tracking antibody gene usage patterns following vaccination. Analyzing V gene family bias and CDR3 sequences reveals immune response dynamics and identifies vaccine-elicited antibody lineages.
  • B cell repertoire sequencing: Analyzing millions of antibody sequences from high-throughput sequencing to map the complete antibody repertoire. IgBLAST enables large-scale annotation of next-generation sequencing data for repertoire studies.
  • Clonal lineage tracing: Reconstructing B cell evolutionary histories by identifying antibodies sharing common V(D)J rearrangements. Antibodies with identical V/D/J gene usage and similar CDR3 sequences descend from a common ancestor.
  • Autoimmune disease research: Characterizing autoreactive antibody repertoires in diseases like systemic lupus erythematosus and rheumatoid arthritis. Certain V gene families show biased usage in autoimmune conditions.
  • Cancer immunotherapy: Analyzing tumor-infiltrating B cells and identifying tumor-reactive antibodies. V/D/J profiling reveals clonal expansions associated with anti-tumor immune responses.
  • Antibody engineering: Guiding humanization of mouse antibodies and optimization of therapeutic candidates. Understanding V gene identity and CDR3 composition informs rational antibody design strategies.

How to use IgBLAST online

ProteinIQ provides a web-based interface for running IgBLAST without command-line installation or local database setup. Upload antibody or TCR sequences in FASTA format, configure organism and receptor settings, and receive comprehensive V/D/J gene assignments with CDR region annotations in a sortable spreadsheet format.

Inputs

InputDescription
Antibody/TCR sequencesFASTA-formatted nucleotide or protein sequences from immunoglobulin or T cell receptor variable regions. Accepts batch submissions with multiple sequences. Maximum file size: 50 MB.

Settings

Sequence configuration

SettingDescription
OrganismSpecies source of germline gene database. Options: Human (default), Mouse, Rabbit, Rat, Rhesus monkey. Selects appropriate V/D/J gene reference libraries.
Receptor typeAntigen receptor class. Ig (immunoglobulin/antibody, default) or TCR (T cell receptor). Determines which gene segment databases are searched.
Sequence typeInput sequence format. Auto-detect (default) determines whether sequences are nucleotide or protein based on character composition. Nucleotide (DNA/RNA) provides complete V/D/J analysis. Protein (amino acid) analyzes V gene only.

Germline database

SettingDescription
Database sourceReference germline gene library. IMGT (default, recommended) provides standardized international nomenclature. NCBI includes additional species coverage.
Domain systemCDR numbering scheme. IMGT (default) is the international standard for antibody analysis. Kabat is an alternative legacy numbering system.

Output options

SettingDescription
Max alignments per geneNumber of top-scoring V/D/J gene matches to report per sequence (1–10, default 3). Higher values reveal alternative gene assignments when sequences match multiple alleles.
Show translationInclude amino acid translation for nucleotide input sequences (default enabled).

Results

The output consists of a spreadsheet with one row per analyzed sequence, showing gene segment assignments, identity scores, CDR3 sequences, and productivity status.

ColumnDescription
Query IDSequence identifier from FASTA header.
ChainAntibody or TCR chain classification. VH (heavy chain), VL-KAPPA (kappa light chain), VL-LAMBDA (lambda light chain), or TCR chains (VA, VB, VG, VD).
V GeneVariable gene segment assignment with allele designation (e.g., IGHV3-23*01). Multiple matches separated by commas indicate ambiguous assignments.
V Identity %Percentage sequence identity to assigned V gene germline sequence. Reflects degree of somatic hypermutation.
D GeneDiversity gene segment assignment (heavy chains and TCR beta/delta only). Light chains show N/A as they lack D segments.
J GeneJoining gene segment assignment.
CDR3 (AA)Complementarity determining region 3 amino acid sequence. The most variable antibody region responsible for antigen specificity.
CDR3 (NT)CDR3 nucleotide sequence.
JunctionComplete nucleotide sequence spanning the V-D-J junction, including trimmed germline bases and non-templated insertions.
ProductiveRearrangement functionality. Yes indicates in-frame junction without premature stop codons. No indicates non-functional rearrangement.

Interpreting V identity scores

V gene identity percentages reveal the extent of somatic hypermutation:

  • 95–100%: Minimal mutation, characteristic of naive B cells or germline-like antibodies
  • 85–95%: Moderate mutation, typical of early immune responses or IgM antibodies
  • <85%: Extensive mutation, indicative of affinity-matured antibodies from memory B cells

Lower identity scores suggest prolonged antigen-driven selection and affinity maturation.

Understanding CDR3 sequences

CDR3 represents the antibody's primary antigen contact region and exhibits the highest sequence diversity due to junctional diversity during V(D)J recombination. The CDR3 amino acid sequence serves as a molecular fingerprint for clonal B cell populations. Identical CDR3 sequences across multiple antibodies indicate they likely originated from the same B cell ancestor.

Interpreting D gene assignments

D (Diversity) genes are particularly short (10–30 nucleotides) and undergo extensive nucleotide deletions and additions during recombination. Multiple D gene matches (e.g., IGHD3-10*01,IGHD3-10*02,IGHD3-16*01) indicate ambiguous assignments due to:

  • Junctional modifications: Terminal nucleotides are trimmed during recombination
  • Non-templated additions: Random nucleotides inserted at junctions
  • Allelic similarity: Different alleles or genes share highly similar sequences

When multiple D genes are reported, the first listed represents the top-scoring match.

How does IgBLAST work?

IgBLAST employs a specialized multi-stage BLAST search strategy with optimized parameters for each gene segment type, combined with biological constraint enforcement and automated CDR/framework region annotation.

Sequential gene segment identification

Unlike standard BLAST which performs a single search, IgBLAST executes three sequential searches with segment-specific parameters:

  1. V gene search: Word size 9, mismatch penalty −1, identifying the ~290-base variable region
  2. J gene search: Word size 7, mismatch penalty −3, expect cutoff 1000 to accommodate short J segments
  3. D gene search: Adjustable word size (default 5), mismatch penalty −4 to detect highly modified diversity regions

After identifying the top V gene match, that region is masked before searching for D and J genes, preventing spurious alignments to the already-identified V segment.

Biological constraint enforcement

IgBLAST applies immunological rules during analysis:

  • Positional constraints: D genes must lie between V and J genes in the rearranged sequence
  • Locus specificity: All segments (V, D, J) must originate from the same immunoglobulin locus (IGH, IGK, or IGL for antibodies; TRA, TRB, TRG, or TRD for TCRs)
  • Chain compatibility: Heavy chains require V-D-J rearrangement while light chains use V-J only

CDR and framework region delineation

The tool maps complementarity determining regions (CDRs) and framework regions (FRs) by:

  1. Identifying the top V gene germline match
  2. Transferring pre-annotated FR/CDR boundary positions from the germline database
  3. Aligning these boundaries to the query sequence accounting for insertions/deletions

This automated annotation eliminates manual boundary identification required when using standard sequence alignment tools.

Germline database architecture

IgBLAST searches against curated germline gene databases containing:

  • V gene databases: Complete sets of functional and pseudogene variable segments with FR/CDR annotations
  • D gene databases: Diversity gene segments (IGH, TRB, TRD loci only)
  • J gene databases: Joining gene segments for all loci

The IMGT database provides standardized international nomenclature, while NCBI databases offer broader species coverage. Both databases undergo regular updates as new germline gene sequences are characterized.

V(D)J recombination background

Understanding V(D)J recombination is essential for interpreting IgBLAST results. During B cell and T cell development, developing lymphocytes undergo V(D)J recombination, a somatic DNA rearrangement process that generates antigen receptor diversity.

Gene segment organization

Immunoglobulin and TCR genes are organized as arrays of multiple gene segments in germline DNA:

  • Heavy chains: Separate V, D, and J gene segment clusters
  • Light chains: V and J gene segment clusters (no D segments)

Human immunoglobulin heavy chain loci contain approximately 50 functional V genes, 23 D genes, and 6 J genes, enabling combinatorial diversity.

Recombination process

During lymphocyte development:

  1. Random selection: One V, one D (heavy chains only), and one J segment are randomly chosen
  2. DNA excision: Intervening DNA sequences between selected segments are permanently deleted
  3. Segment joining: Selected segments are fused together with nucleotide modifications at junctions

This process creates a continuous coding sequence for the antibody variable region.

Junctional diversity mechanisms

Additional diversity is generated at V-D-J junctions through:

  • Exonuclease trimming: Terminal nucleotides are removed from gene segment ends
  • N-nucleotide addition: Terminal deoxynucleotidyl transferase (TdT) adds random nucleotides at junctions
  • P-nucleotide addition: Palindromic sequences created during DNA hairpin resolution

These modifications create the hypervariable CDR3 region, which IgBLAST identifies in the junction analysis.

Somatic hypermutation

After initial V(D)J recombination, activated B cells undergo somatic hypermutation in germinal centers, introducing point mutations into V region genes. This process enables affinity maturation—progressive improvement of antibody binding affinity through iterative mutation and antigen-driven selection. The V gene identity percentage reported by IgBLAST directly reflects the extent of somatic hypermutation.

Limitations

  • Rigid germline reference requirement: Analysis accuracy depends on comprehensive germline gene databases. Novel or poorly characterized species may lack complete reference libraries.
  • D gene assignment ambiguity: Extensive junctional modifications often prevent definitive D gene identification, particularly in highly mutated sequences. Multiple equally plausible D gene assignments are common.
  • Protein sequence limitations: Protein input sequences receive V gene assignment only. D and J gene identification, junction analysis, and productivity determination require nucleotide sequences.
  • Chimeric sequences: Sequences containing artifacts from PCR recombination between different templates may produce biologically implausible gene assignments.
  • Non-standard species: Official support is limited to human, mouse, rabbit, rat, and rhesus monkey. Other species require custom germline database preparation.
  • HMMER — Profile HMM-based sequence searching for identifying conserved protein domains and motifs
  • MMseqs2 — Ultra-fast sequence clustering and similarity searching for large antibody repertoire analysis
  • Clustal Omega — Multiple sequence alignment for comparing related antibody sequences
  • MAFFT — Rapid multiple sequence alignment suitable for large antibody repertoire datasets