BoltzGen

(v0.3.2)

Design protein, peptide, and nanobody binders against biomolecular targets with configurable lengths and binding sites.

What is BoltzGen?

BoltzGen is an all-atom diffusion model for binder design. It generates new peptides, miniproteins, nanobodies, and antibody/Fab CDR binders directly against a supplied target structure, and it also includes a protein-small molecule mode for designing proteins around a ligand.

Rather than optimizing a fixed starting sequence, BoltzGen samples binder sequence and structure together. That makes it useful when no obvious scaffold exists or when several geometrically distinct binding solutions are worth exploring. The model was introduced by Hannes Stärk and was experimentally evaluated across multiple wet-lab campaigns, with especially strong results on novel targets where template-heavy approaches tend to struggle.

How to use BoltzGen online

ProteinIQ runs BoltzGen on hosted GPU infrastructure, so binder design jobs can be configured in the browser and submitted without setting up CUDA, PyTorch, or the BoltzGen CLI. The online interface supports the current v0.3.2 protocol family and the ranking controls that were added in newer BoltzGen releases.

Inputs

InputDescription
Target proteinUpload a .pdb, .ent, .cif, or .mmcif structure file, or fetch a structure from the RCSB PDB by ID. Used for Peptide, Protein, Nanobody, and Antibody / Fab CDR protocols.
Target ligandUsed for Protein-small molecule mode. Accepts a SMILES string, CCD code, or PubChem lookup.
Job nameOptional label for tracking runs in ProteinIQ job history.

Settings

Core settings

SettingDescription
ProtocolDesign mode. Peptide is intended for short binders, Protein for de novo miniproteins, Nanobody for single-domain antibody binders, Antibody / Fab CDR for scaffold-guided antibody CDR design, and Protein-small molecule for designing a protein around a ligand.
Antibody scaffoldAvailable for Antibody / Fab CDR. ProteinIQ includes therapeutic antibody scaffolds such as adalimumab, dupilumab, and ustekinumab, or All scaffolds for broader sampling.
Number of designsTotal number of candidates generated before filtering. Runtime and credit cost scale with this value.
BudgetNumber of final designs retained after BoltzGen applies quality and diversity filtering.

Binder settings

SettingDescription
Uniform binder sizeUses a single exact binder length when enabled. Disabling it allows length sampling between the configured minimum and maximum.
Binder lengthTarget binder length when uniform sizing is enabled. Appropriate ranges depend on protocol: peptides are short, protein binders are larger, and nanobody / antibody modes use scaffold-driven sizes.
Minimum lengthLower bound for sampled binder size when Uniform binder size is off.
Maximum lengthUpper bound for sampled binder size when Uniform binder size is off.
Binding siteOptional target-site constraint in chain:residues form, for example A:12,14,61. This biases designs toward a specified epitope or pocket.
Cyclic peptideAdds an N-to-C terminal cyclization constraint for peptide binders.
Target chainsOptional comma-separated subset of target chains to keep, for example A,B.

Inverse folding

SettingDescription
Skip inverse foldingStops after backbone generation and skips sequence redesign.
Sequences per backboneNumber of inverse-folded sequences produced for each designed backbone.
Avoid amino acidsOne-letter amino acid codes to exclude during inverse folding, such as C to avoid undesired disulfides.

Filtering and ranking

SettingDescription
Quality vs diversity (alpha)Controls the tradeoff between top-scoring designs and structural diversity in the final ranked set.
Filter biased compositionsRemoves amino-acid composition outliers. The current default is true, and ProteinIQ matches that behavior.
Refolding RMSD thresholdUpper RMSD cutoff used during refolding-based filtering. Most relevant for protein-sized binders.
Custom filtersExtra hard filters in metric>value or metric<value form, one per line.
Metrics weightsPer-metric ranking weights using the current metric=value syntax, one per line. Larger values down-weight a metric rank, and metric=none removes that metric from ranking.
Size bucketsOptional cap on how many retained designs may come from each size range. Format: min-max:count, one per line, for example 10-20:5. This is useful when sampling variable binder lengths.

Diffusion sampling

SettingDescription
Step scaleDiffusion step size. Higher values generally increase exploration at the cost of stability.
Noise scaleNoise level during sampling. Lower values make generations more deterministic.
Model checkpointBoth mixes BoltzGen's diverse and adherence checkpoints. Diverse favors novelty, while Adherence favors constraint fidelity.

Structure constraints

SettingDescription
Secondary structureBinder secondary-structure constraints in chain:start-end:type format, one per line. Supported types are HELIX, SHEET, and LOOP.
Disulfide bondsCysteine bridge constraints in chain:residue,chain:residue format.
Staple bondsNon-natural crosslinks in chain:residue:atom,chain:residue:atom format.

Advanced design

SettingDescription
Fixed sequence regionsLocks a binder segment to a specific sequence using chain:start-end:sequence.
Binding residuesDeclares binder positions that should contact the target.
Non-binding residuesDeclares binder positions that should avoid target contact.
Design insertionsVariable-length insertion syntax for protein redesign in chain:position:min..max:secondary_structure form. The secondary-structure value is optional.
Residue constraintsInverse-folding amino-acid constraints in chain:position:allowed|disallowed:amino_acids form, for example B:3..5:disallowed:CM.

Execution control

SettingDescription
Reuse existing resultsReuses compatible intermediate outputs if the same run directory is resumed.
Pipeline stepsRuns a specific stage such as design, inverse_folding, folding, analysis, or filtering instead of the full pipeline.
Redesign existing structureEquivalent to BoltzGen's inverse-fold-only mode. Requires a fully specified structure with the binder already positioned.

Results

ProteinIQ returns ranked designs in an interactive viewer together with downloadable structure and sequence files. Runs also include the original uploaded reference input so the designed binders can be compared against the starting target or ligand.

OutputDescription
ViewerInteractive structural view of ranked designs and reference inputs.
RankPosition in the final filtered list. Lower ranks are preferred.
Quality (pTM)Predicted complex quality score on a 0 to 1 scale. Higher values indicate stronger structural confidence.
Error (Å)Predicted aligned error for the complex. Lower values indicate a more confident model.
Interface (Ų)Buried surface area between binder and target. Larger interfaces often indicate more extensive contacts, though optimal values depend on binder class.
SequenceDesigned amino acid sequence for the retained candidate.
FilesDownloadable CIF structures, FASTA sequences, and reference inputs when available.

Interpreting results

Quality (pTM)

RangeInterpretation
0.8-1.0Strong structural confidence. Often the first tier to inspect experimentally.
0.6-0.79Usable designs with moderate confidence. Visual inspection and orthogonal validation are advisable.
<0.6Lower-confidence models that may still be interesting for difficult targets, but usually require more screening.

Error (Å)

RangeInterpretation
<3 ÅHigh-confidence binder-target geometry.
3-5 ÅIntermediate confidence. Interfaces may still be plausible but should be reviewed.
>5 ÅLower-confidence complexes or flexible interfaces.

How does BoltzGen work?

All-atom diffusion

BoltzGen models the binder and target at all-atom resolution instead of working only with a backbone trace. The design process begins from noisy coordinates and repeatedly denoises them while conditioning on the target geometry and any user-supplied constraints. Sequence identity is coupled to the structural representation, so side-chain arrangement and residue type are learned together rather than stitched together as separate steps.

Protocol families

The same framework supports several design regimes:

  • Peptide binders: Short linear or cyclic peptides for compact interfaces and pockets
  • Protein binders: De novo miniproteins for larger or more structured interaction surfaces
  • Nanobody binders: Single-domain antibody designs with nanobody-style geometry
  • Antibody / Fab CDR design: CDR generation on fixed therapeutic antibody scaffolds
  • Protein-small molecule design: Protein design around a ligand, with additional affinity-oriented analysis

Pipeline execution

In ProteinIQ, BoltzGen typically runs the same five major stages available from the command line:

  1. design: Generates candidate backbones
  2. inverse_folding: Assigns or redesigns sequences
  3. folding: Refolds designed candidates for structural validation
  4. analysis: Computes interface and quality metrics
  5. filtering: Selects the final set by balancing quality and diversity

Protein-small molecule runs can add ligand-specific analysis, and Redesign existing structure skips backbone generation and enters directly at inverse folding.

Constraint language

BoltzGen's design specification language makes it possible to bias generation without retraining the model:

  • Binding-site constraints: Focus contact formation on specific target residues
  • Secondary-structure constraints: Favor helices, sheets, or loops in selected binder segments
  • Covalent constraints: Encode cyclic peptides, disulfides, and staples
  • Sequence constraints: Keep selected residues fixed or bias interaction positions

These constraints do not guarantee success, but they substantially narrow the search space when a design hypothesis already exists.

Limitations

  • Target flexibility is limited: Large induced-fit rearrangements are not modeled explicitly during design.
  • Cost grows quickly with binder size: Protein-length binders are substantially more expensive than short peptides.
  • Experimental success is still context-dependent: High pTM and low PAE do not guarantee expression, solubility, or binding in vitro.
  • Protein-small molecule design is more specialized: It is useful for ligand-focused problems, but the main BoltzGen validation literature emphasizes peptide, nanobody, and antibody-style binder design.
  • Design insertions apply to uploaded-structure redesign: For de novo binder lengths, use the binder-length controls or fixed sequence constraints instead.
  • Inverse-fold-only mode requires a prepared structure: The binder must already be positioned in the uploaded complex if Redesign existing structure is enabled.

Related tools

PepMimic

PepMimic

PepMimic designs short peptides that mimic the binding interface of a known protein binder on its target. From a reference protein complex, a latent diffusion model generates peptide candidates constrained to the target interface, and each candidate is scored by interface-mimicry against the reference binder.

PepMLM

PepMLM

Design linear peptide binders for target proteins using a target sequence-conditioned masked language model. PepMLM generates peptide sequences optimized to bind specific protein targets based on ESM-2 protein language modeling.

PocketXMol

PocketXMol

PocketXMol is a pocket-interacting generative foundation model for docking, small-molecule design, and peptide design in protein binding pockets.

RFantibody

RFantibody

Structure-based de novo antibody and nanobody design pipeline combining antibody-tuned RFdiffusion, ProteinMPNN sequence design, and antibody-tuned RoseTTAFold2 filtering.

Proteo-R1

Proteo-R1

Reasoning-guided antibody CDR co-design for antibody-antigen complexes. Proteo-R1 identifies residue-level functional decisions and uses conditional diffusion to generate ranked designed structures with confidence metrics.

PocketFlow

PocketFlow

PocketFlow is a structure-based molecular generative model that designs novel drug-like molecules within protein binding pockets. It uses autoregressive flow modeling with chemical knowledge to generate 100% chemically valid, highly drug-like compounds.

Proteina-Complexa

Proteina-Complexa

Design protein binders against a target structure with NVIDIA BioNeMo's Proteina-Complexa generative pipeline.

EvoDiff

EvoDiff

EvoDiff is a diffusion-based protein sequence generation framework from Microsoft Research. ProteinIQ currently wraps the EvoDiff-Seq OA_DM_38M model for unconditional protein generation, motif scaffolding, and user-sequence inpainting.

Genie 3

Genie 3

Generate protein structures and scaffolds with Genie 3, an all-atom SE(3)-equivariant diffusion model. Genie 3 supports unconditional protein generation, motif scaffolding, and hotspot-targeted binder design.

ODesign

ODesign

All-atom generative AI for designing protein binders. Specify target binding sites and generate diverse binding proteins with fine-grained control over interaction parameters.