What is IgBLAST?
IgBLAST (Immunoglobulin BLAST) is a specialized sequence analysis tool designed to analyze immunoglobulin (antibody) and T cell receptor variable domain sequences. Developed by researchers at the National Center for Biotechnology Information (NCBI), IgBLAST automates the identification of V (Variable), D (Diversity), and J (Joining) gene segments, delineates complementarity determining regions (CDRs), and analyzes rearrangement junctions in antigen receptor sequences.
The tool was published in 2013 in Nucleic Acids Research by Jian Ye, Ning Ma, Thomas L. Madden, and James M. Ostell to address limitations of general-purpose sequence alignment tools like standard BLAST. Unlike conventional BLAST, which cannot efficiently handle the rearranged nature and variable segment lengths characteristic of immunoglobulin sequences, IgBLAST performs specialized sequential searches optimized for antibody and T cell receptor analysis.
The most common applications include
- Therapeutic antibody discovery: Identifying naturally occurring antibodies with desired specificities from immune repertoires for drug development. Companies developing antibody therapeutics against cancer, infectious diseases, and autoimmune conditions rely on V/D/J analysis to characterize candidate antibodies.
- Vaccine response monitoring: Tracking antibody gene usage patterns following vaccination. Analyzing V gene family bias and CDR3 sequences reveals immune response dynamics and identifies vaccine-elicited antibody lineages.
- B cell repertoire sequencing: Analyzing millions of antibody sequences from high-throughput sequencing to map the complete antibody repertoire. IgBLAST enables large-scale annotation of next-generation sequencing data for repertoire studies.
- Clonal lineage tracing: Reconstructing B cell evolutionary histories by identifying antibodies sharing common V(D)J rearrangements. Antibodies with identical V/D/J gene usage and similar CDR3 sequences descend from a common ancestor.
- Autoimmune disease research: Characterizing autoreactive antibody repertoires in diseases like systemic lupus erythematosus and rheumatoid arthritis. Certain V gene families show biased usage in autoimmune conditions.
- Cancer immunotherapy: Analyzing tumor-infiltrating B cells and identifying tumor-reactive antibodies. V/D/J profiling reveals clonal expansions associated with anti-tumor immune responses.
- Antibody engineering: Guiding humanization of mouse antibodies and optimization of therapeutic candidates. Understanding V gene identity and CDR3 composition informs rational antibody design strategies.
How to use IgBLAST online
ProteinIQ provides a web-based interface for running IgBLAST without command-line installation or local database setup. Upload antibody or TCR sequences in FASTA format, configure organism and receptor settings, and receive comprehensive V/D/J gene assignments with CDR region annotations in a sortable spreadsheet format.
Inputs
| Input | Description |
|---|---|
Antibody/TCR sequences | FASTA-formatted nucleotide or protein sequences from immunoglobulin or T cell receptor variable regions. Accepts batch submissions with multiple sequences. Maximum file size: 50 MB. |
Settings
Sequence configuration
| Setting | Description |
|---|---|
Organism | Species source of germline gene database. Options: Human (default), Mouse, Rabbit, Rat, Rhesus monkey. Selects appropriate V/D/J gene reference libraries. |
Receptor type | Antigen receptor class. Ig (immunoglobulin/antibody, default) or TCR (T cell receptor). Determines which gene segment databases are searched. |
Sequence type | Input sequence format. Auto-detect (default) determines whether sequences are nucleotide or protein based on character composition. Nucleotide (DNA/RNA) provides complete V/D/J analysis. Protein (amino acid) analyzes V gene only. |
Germline database
| Setting | Description |
|---|---|
Database source | Reference germline gene library. IMGT (default, recommended) provides standardized international nomenclature. NCBI includes additional species coverage. |
Domain system | CDR numbering scheme. IMGT (default) is the international standard for antibody analysis. Kabat is an alternative legacy numbering system. |
Output options
| Setting | Description |
|---|---|
Max alignments per gene | Number of top-scoring V/D/J gene matches to report per sequence (1–10, default 3). Higher values reveal alternative gene assignments when sequences match multiple alleles. |
Show translation | Include amino acid translation for nucleotide input sequences (default enabled). |
Results
The output consists of a spreadsheet with one row per analyzed sequence, showing gene segment assignments, identity scores, CDR3 sequences, and productivity status.
| Column | Description |
|---|---|
Query ID | Sequence identifier from FASTA header. |
Chain | Antibody or TCR chain classification. VH (heavy chain), VL-KAPPA (kappa light chain), VL-LAMBDA (lambda light chain), or TCR chains (VA, VB, VG, VD). |
V Gene | Variable gene segment assignment with allele designation (e.g., IGHV3-23*01). Multiple matches separated by commas indicate ambiguous assignments. |
V Identity % | Percentage sequence identity to assigned V gene germline sequence. Reflects degree of somatic hypermutation. |
D Gene | Diversity gene segment assignment (heavy chains and TCR beta/delta only). Light chains show N/A as they lack D segments. |
J Gene | Joining gene segment assignment. |
CDR3 (AA) | Complementarity determining region 3 amino acid sequence. The most variable antibody region responsible for antigen specificity. |
CDR3 (NT) | CDR3 nucleotide sequence. |
Junction | Complete nucleotide sequence spanning the V-D-J junction, including trimmed germline bases and non-templated insertions. |
Productive | Rearrangement functionality. Yes indicates in-frame junction without premature stop codons. No indicates non-functional rearrangement. |
Interpreting V identity scores
V gene identity percentages reveal the extent of somatic hypermutation:
- 95–100%: Minimal mutation, characteristic of naive B cells or germline-like antibodies
- 85–95%: Moderate mutation, typical of early immune responses or IgM antibodies
- <85%: Extensive mutation, indicative of affinity-matured antibodies from memory B cells
Lower identity scores suggest prolonged antigen-driven selection and affinity maturation.
Understanding CDR3 sequences
CDR3 represents the antibody's primary antigen contact region and exhibits the highest sequence diversity due to junctional diversity during V(D)J recombination. The CDR3 amino acid sequence serves as a molecular fingerprint for clonal B cell populations. Identical CDR3 sequences across multiple antibodies indicate they likely originated from the same B cell ancestor.
Interpreting D gene assignments
D (Diversity) genes are particularly short (10–30 nucleotides) and undergo extensive nucleotide deletions and additions during recombination. Multiple D gene matches (e.g., IGHD3-10*01,IGHD3-10*02,IGHD3-16*01) indicate ambiguous assignments due to:
- Junctional modifications: Terminal nucleotides are trimmed during recombination
- Non-templated additions: Random nucleotides inserted at junctions
- Allelic similarity: Different alleles or genes share highly similar sequences
When multiple D genes are reported, the first listed represents the top-scoring match.
How does IgBLAST work?
IgBLAST employs a specialized multi-stage BLAST search strategy with optimized parameters for each gene segment type, combined with biological constraint enforcement and automated CDR/framework region annotation.
Sequential gene segment identification
Unlike standard BLAST which performs a single search, IgBLAST executes three sequential searches with segment-specific parameters:
- V gene search: Word size 9, mismatch penalty −1, identifying the ~290-base variable region
- J gene search: Word size 7, mismatch penalty −3, expect cutoff 1000 to accommodate short J segments
- D gene search: Adjustable word size (default 5), mismatch penalty −4 to detect highly modified diversity regions
After identifying the top V gene match, that region is masked before searching for D and J genes, preventing spurious alignments to the already-identified V segment.
Biological constraint enforcement
IgBLAST applies immunological rules during analysis:
- Positional constraints: D genes must lie between V and J genes in the rearranged sequence
- Locus specificity: All segments (V, D, J) must originate from the same immunoglobulin locus (IGH, IGK, or IGL for antibodies; TRA, TRB, TRG, or TRD for TCRs)
- Chain compatibility: Heavy chains require V-D-J rearrangement while light chains use V-J only
CDR and framework region delineation
The tool maps complementarity determining regions (CDRs) and framework regions (FRs) by:
- Identifying the top V gene germline match
- Transferring pre-annotated FR/CDR boundary positions from the germline database
- Aligning these boundaries to the query sequence accounting for insertions/deletions
This automated annotation eliminates manual boundary identification required when using standard sequence alignment tools.
Germline database architecture
IgBLAST searches against curated germline gene databases containing:
- V gene databases: Complete sets of functional and pseudogene variable segments with FR/CDR annotations
- D gene databases: Diversity gene segments (IGH, TRB, TRD loci only)
- J gene databases: Joining gene segments for all loci
The IMGT database provides standardized international nomenclature, while NCBI databases offer broader species coverage. Both databases undergo regular updates as new germline gene sequences are characterized.
V(D)J recombination background
Understanding V(D)J recombination is essential for interpreting IgBLAST results. During B cell and T cell development, developing lymphocytes undergo V(D)J recombination, a somatic DNA rearrangement process that generates antigen receptor diversity.
Gene segment organization
Immunoglobulin and TCR genes are organized as arrays of multiple gene segments in germline DNA:
- Heavy chains: Separate V, D, and J gene segment clusters
- Light chains: V and J gene segment clusters (no D segments)
Human immunoglobulin heavy chain loci contain approximately 50 functional V genes, 23 D genes, and 6 J genes, enabling combinatorial diversity.
Recombination process
During lymphocyte development:
- Random selection: One V, one D (heavy chains only), and one J segment are randomly chosen
- DNA excision: Intervening DNA sequences between selected segments are permanently deleted
- Segment joining: Selected segments are fused together with nucleotide modifications at junctions
This process creates a continuous coding sequence for the antibody variable region.
Junctional diversity mechanisms
Additional diversity is generated at V-D-J junctions through:
- Exonuclease trimming: Terminal nucleotides are removed from gene segment ends
- N-nucleotide addition: Terminal deoxynucleotidyl transferase (TdT) adds random nucleotides at junctions
- P-nucleotide addition: Palindromic sequences created during DNA hairpin resolution
These modifications create the hypervariable CDR3 region, which IgBLAST identifies in the junction analysis.
Somatic hypermutation
After initial V(D)J recombination, activated B cells undergo somatic hypermutation in germinal centers, introducing point mutations into V region genes. This process enables affinity maturation—progressive improvement of antibody binding affinity through iterative mutation and antigen-driven selection. The V gene identity percentage reported by IgBLAST directly reflects the extent of somatic hypermutation.
Limitations
- Rigid germline reference requirement: Analysis accuracy depends on comprehensive germline gene databases. Novel or poorly characterized species may lack complete reference libraries.
- D gene assignment ambiguity: Extensive junctional modifications often prevent definitive D gene identification, particularly in highly mutated sequences. Multiple equally plausible D gene assignments are common.
- Protein sequence limitations: Protein input sequences receive V gene assignment only. D and J gene identification, junction analysis, and productivity determination require nucleotide sequences.
- Chimeric sequences: Sequences containing artifacts from PCR recombination between different templates may produce biologically implausible gene assignments.
- Non-standard species: Official support is limited to human, mouse, rabbit, rat, and rhesus monkey. Other species require custom germline database preparation.
Related tools
- HMMER — Profile HMM-based sequence searching for identifying conserved protein domains and motifs
- MMseqs2 — Ultra-fast sequence clustering and similarity searching for large antibody repertoire analysis
- Clustal Omega — Multiple sequence alignment for comparing related antibody sequences
- MAFFT — Rapid multiple sequence alignment suitable for large antibody repertoire datasets
