ProteinIQ
BioPhi icon

BioPhi

Humanize and evaluate antibody sequences for therapeutic development. Sapiens mode provides deep learning-based humanization, OASis mode evaluates humanness scores.

What is BioPhi?

BioPhi is an open-source antibody engineering platform developed by Merck that combines deep learning humanization with repertoire-based humanness evaluation. The platform features two complementary systems trained on the Observed Antibody Space (OAS) database: Sapiens for automated humanization and OASis for humanness scoring.

BioPhi addresses a critical challenge in therapeutic antibody development. Antibodies derived from mice, rabbits, or other non-human species often trigger immune responses in patients. Humanization—the process of modifying non-human antibodies to resemble human sequences—reduces immunogenicity while preserving antigen binding. Traditional methods like CDR grafting require manual sequence engineering, while BioPhi automates this process using patterns learned from millions of human antibody sequences.

How to use BioPhi

Inputs

BioPhi accepts antibody variable domain sequences in FASTA format. Both heavy (VH) and light (VL) chains can be processed individually or in batches. The platform auto-detects chain types based on sequence characteristics—heavy chains typically start with EVQ/QVQ/DVQ motifs and are longer (~120 residues) than light chains (~110 residues), which often begin with DIQ/EIV/SSE motifs.

1>antibody_12EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYAMSWVRQAPGKGLEWVSAISGSGGSTYYADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCAR

Settings

Processing mode

ModeFunction
SapiensGenerates humanized variants using deep learning
OASisEvaluates humanness without sequence modification
BothPerforms humanization and scoring (recommended)

Sapiens humanization parameters

SettingRangeDefaultPurpose
Number of designs1–205Variants generated per input sequence
Sampling temperature0.1–1.00.3Controls sequence diversity (lower = conservative)
Humanize CDRsOn/OffOffEnables CDR modification (risks altering binding)
Backmutate Vernier positionsOn/OffOffReverts structural support residues to original
CDR numbering schemeKabat/IMGT/Chothia/NorthKabatDefines CDR boundaries

Temperature determines sampling behavior: values below 0.3 produce conservative humanization with minimal structural risk, while values above 0.5 generate more diverse sequences that may sacrifice stability or humanness.

OASis evaluation parameters

SettingOptionsDefaultThreshold
Reference speciesHuman/MouseHumanDatabase for comparison
Prevalence thresholdLoose/Relaxed/Medium/StrictMediumMinimum subject frequency (1%/10%/50%/90%)

Prevalence threshold controls stringency—stricter thresholds require 9-mer peptides to appear in a higher percentage of human subjects to be considered "human-like."

Output

Results are returned in a spreadsheet with the following columns:

ColumnDescription
Sequence IDOriginal input identifier
Design #Variant number for Sapiens mode
Chain typeHeavy or light chain
Identity %Sequence identity to original input
Humanness scoreOASis identity score (0–100%)
OASis percentilePercentile rank in human antibody database
MutationsNumber of amino acid changes
Mutation detailsSpecific substitutions (e.g., "A23G, T45S")
V germlineClosest V gene match
J germlineClosest J gene match
Germline %Germline content percentage
Humanized sequenceOutput sequence in FASTA format
LengthResidue count

How BioPhi works

Sapiens: BERT-based humanization

Sapiens is a BERT-style language model trained on variable domain sequences from 266 human subjects in the OAS database. The model learns probability distributions for each amino acid at each position by training to predict masked or mutated residues in unaligned sequences.

During humanization, Sapiens evaluates the input sequence and computes likelihood scores for all 20 amino acids at every position. Non-human residues—those with low probability in human antibody space—are identified and replaced by sampling from the model's probability distribution. This approach captures complex sequence dependencies that simple germline-matching methods cannot detect.

The model's attention mechanism allows it to recognize context-dependent humanness patterns. A residue considered non-human in one sequence context may be perfectly human in another, depending on surrounding amino acids and structural constraints.

OASis evaluates humanness by extracting all overlapping 9-amino-acid peptides (9-mers) from the input sequence and searching for exact matches in the OAS database. For each 9-mer, the algorithm calculates prevalence—the percentage of human subjects containing that peptide.

The overall humanness score aggregates these prevalence values across the sequence. High scores indicate sequences composed primarily of peptides commonly found in human antibody repertoires. The percentile metric compares the input against all sequences in OAS, providing a rank-based assessment.

This granular approach produces interpretable results. Unlike black-box scoring methods, OASis identifies specific regions that deviate from human norms, enabling targeted engineering. The 9-mer window captures sufficient context for immunogenicity assessment while remaining computationally tractable.

Validation and performance

In a benchmark of 177 therapeutic antibodies, Sapiens generated humanized sequences comparable in quality to those produced by human experts, achieving high humanness scores while maintaining sequence identity. OASis separated human from non-human sequences with high accuracy and showed correlation with clinical immunogenicity data.

Interpreting results

Humanness metrics

OASis percentile indicates relative humanness within the database. Sequences above the 90th percentile exhibit excellent human-like characteristics with minimal immunogenicity risk. Values between 70–89 are acceptable for most therapeutic applications. Sequences below the 50th percentile warrant additional optimization.

The humanness score (OASis identity) provides an absolute measure. Values above 95% indicate highly human-like sequences, while scores below 70% suggest non-human origin with elevated immunogenicity potential.

Selection criteria for humanized variants

When evaluating multiple Sapiens designs, prioritize candidates with high OASis percentiles while maintaining above 85% identity to the original sequence. Identity preservation correlates with retained binding affinity—lower identity increases the likelihood of disrupted antigen recognition.

Mutation count serves as a practical consideration. Fewer changes reduce synthesis cost and simplify validation. Germline content percentage indicates proximity to natural human sequences; higher values generally predict lower immunogenicity.

A recommended selection workflow:

  1. Filter for OASis percentile above 70
  2. Require sequence identity above 85%
  3. Select designs with fewest mutations among remaining candidates
  4. Validate binding experimentally before advancing

CDR considerations

Complementarity Determining Regions (CDRs) mediate antigen binding and are typically preserved during humanization. The default setting excludes CDRs from modification, changing only framework regions. Enabling CDR humanization may improve humanness scores but risks altering binding specificity or affinity. Such modifications necessitate thorough experimental validation through surface plasmon resonance, ELISA, or functional assays.

Backmutating Vernier positions—framework residues that structurally support CDR conformation—can help preserve binding when framework changes inadvertently affect CDR geometry.

Limitations

BioPhi operates on variable domain sequences only. Full-length antibodies, constant regions, or single-domain antibodies require preprocessing to extract the variable domain.

Sapiens predicts humanness based on sequence patterns in the training data. The model cannot account for factors outside its training scope, such as rare post-translational modifications, unusual structural constraints, or context-specific immunogenicity in particular patient populations.

OASis identifies sequences similar to those in the OAS database but cannot guarantee lack of immunogenicity. Clinical immune responses depend on multiple factors including HLA haplotype, dosing regimen, and epitope formation. BioPhi reduces risk but does not eliminate the need for preclinical and clinical validation.

Chain type auto-detection works reliably for standard antibody formats but may fail for engineered variants with atypical sequence characteristics. Manual specification resolves such cases.

  • IgBLAST: Analyze V/D/J gene segments and identify CDR boundaries before humanization
  • AbLang: Restore missing residues in antibody sequences with an OAS-trained language model
  • AbLang-2: Generate antibody embeddings and predict non-germline residues with germline bias correction