ProteinIQ

BoltzGen

Design protein and peptide binders against any biomolecular target using state-of-the-art generative AI. Create novel therapeutics with nanomolar-level binding affinity.

What is BoltzGen?

BoltzGen is a generative AI model for designing protein and peptide binders against any biomolecular target. Developed by Hannes Stärk at MIT and released in November 2025, BoltzGen combines structure prediction and design into a single all-atom diffusion model that generates novel binders from scratch.

Traditional binder design methods modify existing sequences or use physics-based optimization. BoltzGen instead uses generative diffusion—the same technology behind image generation models like DALL-E—to sample entirely new binder sequences and structures. The model supports multiple binder modalities:

  • Peptides: Linear or cyclic peptides (8–30 residues) for small binding pockets
  • Proteins: Miniproteins (50–150 residues) for larger interfaces and complex geometries
  • Nanobodies: Single-domain antibody fragments (~130 residues) with natural stability
  • Fab antibodies: Larger therapeutic antibody fragments (~450 residues) with proven clinical scaffolds

BoltzGen was validated experimentally across 26 targets in eight academic and industry wetlab campaigns. On nine novel targets with less than 30% sequence similarity to any known bound structure, the model achieved a 66% success rate in producing nanomolar-affinity binders—substantially exceeding earlier generative design tools.

The model and code are released under the MIT license, enabling both academic research and commercial drug development.

How to use BoltzGen online

ProteinIQ provides a web-based interface for running BoltzGen without command-line installation. Users upload a target structure, select a binder modality, adjust design parameters, and receive ranked candidate binders with quality metrics.

Inputs

InputDescription
Target proteinThe structure to design binders against. Upload a PDB/CIF file or enter an RCSB PDB ID (e.g., 8JJS) to fetch directly. Structures up to ~500 residues work best.
Target ligandFor protein-small molecule mode only. Enter a SMILES string (e.g., CCO) or CCD code (e.g., TSA), or fetch from PubChem by compound ID.

Settings

Core settings

SettingDescription
ProtocolBinder modality. Peptide (8–30 residues, 1× cost), Protein (50–150 residues, 4.9× cost with refolding), Nanobody (~130 residues, 1× cost), Fab (~450 residues, 2.5× cost), or Protein-small molecule (6.8× cost).
Fab scaffoldFor Fab mode. Select a specific FDA-approved antibody scaffold (adalimumab, dupilumab, etc.) or use All for diverse sampling across 14 therapeutic scaffolds.
Number of designsCandidate binders to generate (10–100). Cost and runtime scale linearly. Start with 10–20 for exploration, 50–100 for production screening.
BudgetFinal designs after diversity filtering. Typically equal to number of designs. Lower this to select only the most structurally diverse candidates.

Binder configuration

SettingDescription
Uniform binder sizeWhen enabled, all designs use exact specified length (faster). When disabled, length varies within a min-max range for diversity (+15% runtime).
Binder lengthTarget residue count. Peptides: 8–30. Proteins: 50–150. Nanobodies: 110–130. Longer sequences increase compute cost quadratically.
Binding siteOptional. Residues where the binder should interact, format chain:residues (e.g., A:12,14,61). Leave empty for automatic binding site detection.
Cyclic peptideEnable backbone cyclization (N- to C-terminus bond) for improved proteolytic stability. Only applies to peptides. Adds ~5% cost.
Target chainsOptional. Comma-separated chain IDs to include from multi-chain structures (e.g., A,B). Leave empty to use all chains.

Inverse folding

SettingDescription
Skip inverse foldingSkip sequence redesign step. Output backbone-only designs without optimized sequences. Faster but may produce less stable designs.
Sequences per backboneNumber of sequence variants per backbone (1–10). More sequences increase diversity but multiply compute time linearly.
Avoid amino acidsSingle-letter codes to exclude from designed sequences. Common: C (prevents unwanted disulfides), M (oxidation-sensitive), W (synthesis issues).

Filtering and ranking

SettingDescription
Quality vs diversity (alpha)Trade-off parameter (0.0–1.0). 0.0 = pure quality ranking. 1.0 = maximum diversity. Default 0.5 balances both.
Filter biased compositionsRemove designs with unusual amino acid frequencies (e.g., excessive alanines).
Refolding RMSD thresholdMaximum backbone deviation in Ångströms between designed and refolded structure (0.5–5.0). Lower values ensure designs fold as intended. Only applies to protein protocol.

Diffusion sampling

SettingDescription
Step scaleDiffusion step size (1.0–3.0, default 1.8). Higher values increase diversity but may reduce quality.
Noise scaleNoise level during generation (0.8–1.0, default 0.98). Lower values produce more deterministic outputs.
Model checkpointBoth uses diverse and adherence models for maximum coverage. Diverse prioritizes novelty. Adherence prioritizes structural accuracy.

Structure constraints

SettingDescription
Secondary structureForce specific secondary structure. Format: chain:start-end:type per line. Types: HELIX, SHEET, LOOP. Example: B:1-5:HELIX.
Disulfide bondsDefine cysteine-cysteine bridges. Format: chain:residue,chain:residue per line. Example: B:3,B:12.
Staple bondsNon-natural crosslinks for peptide stabilization. Format: chain:residue:atom,chain:residue:atom. Common for i,i+4 or i,i+7 spacing.

Advanced design

SettingDescription
Fixed sequence regionsLock positions to specific sequences. Format: chain:start-end:sequence per line. Example: B:1-5:AAAAA.
Binding residuesBinder positions that MUST contact the target. Format: chain:residue1,residue2,....
Non-binding residuesBinder positions that must NOT contact the target. Useful for designing specificity.

Results

BoltzGen returns a ranked list of candidate binders with an interactive 3D viewer. Each design includes structure files (CIF format), sequences (FASTA), and quality metrics.

ColumnDescription
RankPosition in quality-ranked list. Lower is better.
Quality (pTM)Predicted TM-score (0–1). Measures structural quality of the binder-target complex. Higher is better.
Error (Å)Predicted Aligned Error at the interface. Lower indicates more confident binding predictions.
Interface (Ų)Buried surface area. Typical ranges: peptides 500–2,000 Ų, proteins 1,000–3,000 Ų, nanobodies 1,500–2,500 Ų.
SequenceDesigned amino acid sequence in single-letter code.

Interpreting quality metrics

  • pTM ≥ 0.8: Excellent structural quality with high confidence. Prioritize for experimental synthesis.
  • pTM 0.6–0.8: Good quality suitable for most applications. Visual inspection recommended.
  • pTM < 0.6: Lower confidence. Consider generating additional designs or adjusting parameters.
  • PAE < 3 Å at interface: High-confidence binding prediction.
  • PAE 3–5 Å: Moderate confidence. Check interface geometry in 3D viewer.
  • PAE > 5 Å: Lower confidence or flexible regions. May indicate a challenging target.

How does BoltzGen work?

BoltzGen generates binders through an all-atom diffusion process guided by learned patterns from the Protein Data Bank. The approach combines several technical innovations that enable universal binder design.

All-atom diffusion model

Unlike coarse-grained methods that model only backbone atoms, BoltzGen represents every heavy atom in 3D space. This atomic-level precision enables design of specific side-chain interactions, disulfide bonds, and other features critical for binding specificity.

The generation process starts from random coordinates and iteratively refines them through learned denoising steps, guided by the target structure and any user-specified constraints. This generative approach naturally produces diverse binding modes without manual parameter tuning.

Geometry-based residue representation

BoltzGen encodes amino acid identity through geometry rather than discrete labels. Each residue is represented as a fixed set of virtual atoms: the backbone (N, Cα, C, O) plus atoms for the side chain. Amino acid type emerges from how these atoms position relative to the backbone—a technique more amenable to diffusion-based generation than mixing discrete and continuous variables.

Multi-task training

BoltzGen combines structure prediction and design in a single model through multi-task learning. During training, the model randomly performs three tasks—structure prediction, binder design, and structure completion—on the same data. This enables the model to leverage structural knowledge from the entire PDB for design, while design data improves prediction accuracy. Single-task models cannot achieve this synergy.

The training data was carefully curated for diversity. Upsampled antibody and TCR data that could bias toward common therapeutics was removed, and random cropping with multi-task processing was applied to all samples.

Two-checkpoint system

BoltzGen provides two model checkpoints optimized for different objectives:

  • Diverse checkpoint: Prioritizes exploration of binding modes and structural novelty
  • Adherence checkpoint: Prioritizes fidelity to specified constraints and structural accuracy

The default Both setting uses both checkpoints for maximum coverage of the design space.

Pipeline architecture

BoltzGen runs a seven-step pipeline:

  1. Design: Diffusion model generates candidate backbones
  2. Inverse folding: Sequence design using Protein-MPNN-style methods
  3. Design folding: Refolds designed chain alone (protein protocol only)
  4. Complex folding: Refolds full binder-target complex via Boltz-2
  5. Affinity prediction: Binding strength estimation (small-molecule targets only)
  6. Analysis: Multi-metric evaluation (RMSD, hydrophobic packing, contacts)
  7. Filtering: Diversity-quality selection based on alpha parameter

Design specification language

BoltzGen supports flexible constraints that guide generation without retraining:

  • Covalent bonds: Cyclic peptides, disulfide bridges, staple crosslinks
  • Structure constraints: Fix backbone regions via distance constraints
  • Binding site targeting: Specify target residues the binder must contact
  • Secondary structure: Force helical, sheet, or coil regions
  • Fixed sequences: Lock specific positions to known residues
  • Design masks: Control which positions are designed vs. unchanged

These constraints steer the diffusion process toward specific design goals without slowing generation.

Limitations

  • Rigid target backbone: The target protein is treated as rigid during design. Induced-fit effects are not modeled.
  • No explicit solvent: Water molecules and explicit solvation effects are not considered.
  • Canonical amino acids only: Non-natural amino acids and post-translational modifications are not directly supported.
  • Computational requirements: Protein-length binders (50+ residues) require substantial GPU time. Cost scales quadratically with binder length.
  • Small-molecule mode validation: The protein-small molecule protocol has less experimental validation data than peptide and nanobody modes.
  • No guarantee of expressibility: High computational scores do not ensure successful recombinant expression or solubility.

Applications

  • Therapeutic binder development: Design peptide drugs, nanobody therapeutics, or antibody candidates against disease targets
  • Difficult target engagement: Access intrinsically disordered proteins, flat protein surfaces, and other "undruggable" targets
  • Protein-protein interaction modulators: Disrupt or stabilize specific protein interfaces
  • Biosensor development: Create recognition elements for diagnostic applications
  • Research tool generation: Design affinity reagents for pulldowns, co-IPs, and imaging
  • Boltz-2 — Structure prediction model used internally for refolding validation
  • RFDiffusion — Alternative scaffold-based protein design using diffusion
  • Protein-MPNN — Sequence optimization for designed backbones
  • DiffDock — Molecular docking using diffusion models
  • Chai-1 — Independent structure validation for designed complexes
  • PDB Fixer — Structure preparation for targets with missing residues or artifacts