What is BioPhi?
BioPhi is an open-source antibody engineering platform developed by Merck that combines deep learning humanization with repertoire-based humanness evaluation. The platform features two complementary systems trained on the Observed Antibody Space (OAS) database: Sapiens for automated humanization and OASis for humanness scoring.
BioPhi addresses a critical challenge in therapeutic antibody development. Antibodies derived from mice, rabbits, or other non-human species often trigger immune responses in patients. Humanization—the process of modifying non-human antibodies to resemble human sequences—reduces immunogenicity while preserving antigen binding. Traditional methods like CDR grafting require manual sequence engineering, while BioPhi automates this process using patterns learned from millions of human antibody sequences.
How to use BioPhi
Inputs
BioPhi accepts antibody variable domain sequences in FASTA format. Both heavy (VH) and light (VL) chains can be processed individually or in batches. The platform auto-detects chain types based on sequence characteristics—heavy chains typically start with EVQ/QVQ/DVQ motifs and are longer (~120 residues) than light chains (~110 residues), which often begin with DIQ/EIV/SSE motifs.
1>antibody_12EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYAMSWVRQAPGKGLEWVSAISGSGGSTYYADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCARSettings
Processing mode
| Mode | Function |
|---|---|
Sapiens | Generates humanized variants using deep learning |
OASis | Evaluates humanness without sequence modification |
Both | Performs humanization and scoring (recommended) |
Sapiens humanization parameters
| Setting | Range | Default | Purpose |
|---|---|---|---|
Number of designs | 1–20 | 5 | Variants generated per input sequence |
Sampling temperature | 0.1–1.0 | 0.3 | Controls sequence diversity (lower = conservative) |
Humanize CDRs | On/Off | Off | Enables CDR modification (risks altering binding) |
Backmutate Vernier positions | On/Off | Off | Reverts structural support residues to original |
CDR numbering scheme | Kabat/IMGT/Chothia/North | Kabat | Defines CDR boundaries |
Temperature determines sampling behavior: values below 0.3 produce conservative humanization with minimal structural risk, while values above 0.5 generate more diverse sequences that may sacrifice stability or humanness.
OASis evaluation parameters
| Setting | Options | Default | Threshold |
|---|---|---|---|
Reference species | Human/Mouse | Human | Database for comparison |
Prevalence threshold | Loose/Relaxed/Medium/Strict | Medium | Minimum subject frequency (1%/10%/50%/90%) |
Prevalence threshold controls stringency—stricter thresholds require 9-mer peptides to appear in a higher percentage of human subjects to be considered "human-like."
Output
Results are returned in a spreadsheet with the following columns:
| Column | Description |
|---|---|
Sequence ID | Original input identifier |
Design # | Variant number for Sapiens mode |
Chain type | Heavy or light chain |
Identity % | Sequence identity to original input |
Humanness score | OASis identity score (0–100%) |
OASis percentile | Percentile rank in human antibody database |
Mutations | Number of amino acid changes |
Mutation details | Specific substitutions (e.g., "A23G, T45S") |
V germline | Closest V gene match |
J germline | Closest J gene match |
Germline % | Germline content percentage |
Humanized sequence | Output sequence in FASTA format |
Length | Residue count |
How BioPhi works
Sapiens: BERT-based humanization
Sapiens is a BERT-style language model trained on variable domain sequences from 266 human subjects in the OAS database. The model learns probability distributions for each amino acid at each position by training to predict masked or mutated residues in unaligned sequences.
During humanization, Sapiens evaluates the input sequence and computes likelihood scores for all 20 amino acids at every position. Non-human residues—those with low probability in human antibody space—are identified and replaced by sampling from the model's probability distribution. This approach captures complex sequence dependencies that simple germline-matching methods cannot detect.
The model's attention mechanism allows it to recognize context-dependent humanness patterns. A residue considered non-human in one sequence context may be perfectly human in another, depending on surrounding amino acids and structural constraints.
OASis: 9-mer peptide search
OASis evaluates humanness by extracting all overlapping 9-amino-acid peptides (9-mers) from the input sequence and searching for exact matches in the OAS database. For each 9-mer, the algorithm calculates prevalence—the percentage of human subjects containing that peptide.
The overall humanness score aggregates these prevalence values across the sequence. High scores indicate sequences composed primarily of peptides commonly found in human antibody repertoires. The percentile metric compares the input against all sequences in OAS, providing a rank-based assessment.
This granular approach produces interpretable results. Unlike black-box scoring methods, OASis identifies specific regions that deviate from human norms, enabling targeted engineering. The 9-mer window captures sufficient context for immunogenicity assessment while remaining computationally tractable.
Validation and performance
In a benchmark of 177 therapeutic antibodies, Sapiens generated humanized sequences comparable in quality to those produced by human experts, achieving high humanness scores while maintaining sequence identity. OASis separated human from non-human sequences with high accuracy and showed correlation with clinical immunogenicity data.
Interpreting results
Humanness metrics
OASis percentile indicates relative humanness within the database. Sequences above the 90th percentile exhibit excellent human-like characteristics with minimal immunogenicity risk. Values between 70–89 are acceptable for most therapeutic applications. Sequences below the 50th percentile warrant additional optimization.
The humanness score (OASis identity) provides an absolute measure. Values above 95% indicate highly human-like sequences, while scores below 70% suggest non-human origin with elevated immunogenicity potential.
Selection criteria for humanized variants
When evaluating multiple Sapiens designs, prioritize candidates with high OASis percentiles while maintaining above 85% identity to the original sequence. Identity preservation correlates with retained binding affinity—lower identity increases the likelihood of disrupted antigen recognition.
Mutation count serves as a practical consideration. Fewer changes reduce synthesis cost and simplify validation. Germline content percentage indicates proximity to natural human sequences; higher values generally predict lower immunogenicity.
A recommended selection workflow:
- Filter for OASis percentile above 70
- Require sequence identity above 85%
- Select designs with fewest mutations among remaining candidates
- Validate binding experimentally before advancing
CDR considerations
Complementarity Determining Regions (CDRs) mediate antigen binding and are typically preserved during humanization. The default setting excludes CDRs from modification, changing only framework regions. Enabling CDR humanization may improve humanness scores but risks altering binding specificity or affinity. Such modifications necessitate thorough experimental validation through surface plasmon resonance, ELISA, or functional assays.
Backmutating Vernier positions—framework residues that structurally support CDR conformation—can help preserve binding when framework changes inadvertently affect CDR geometry.
Limitations
BioPhi operates on variable domain sequences only. Full-length antibodies, constant regions, or single-domain antibodies require preprocessing to extract the variable domain.
Sapiens predicts humanness based on sequence patterns in the training data. The model cannot account for factors outside its training scope, such as rare post-translational modifications, unusual structural constraints, or context-specific immunogenicity in particular patient populations.
OASis identifies sequences similar to those in the OAS database but cannot guarantee lack of immunogenicity. Clinical immune responses depend on multiple factors including HLA haplotype, dosing regimen, and epitope formation. BioPhi reduces risk but does not eliminate the need for preclinical and clinical validation.
Chain type auto-detection works reliably for standard antibody formats but may fail for engineered variants with atypical sequence characteristics. Manual specification resolves such cases.
