
All-atom generative AI for designing protein binders. Specify target binding sites and generate diverse binding proteins with fine-grained control over interaction parameters.

ProFam-1 is a protein family language model for family-conditioned sequence generation. Provide a protein family FASTA/MSA and generate new sequences with model likelihood scores for downstream ranking and screening.

EvoDiff is a diffusion-based protein sequence generation framework from Microsoft Research. It combines evolutionary-scale data with diffusion models to generate novel protein sequences unconditionally, scaffold structural motifs, or fill in disordered regions through inpainting.

PocketFlow is a structure-based molecular generative model that designs novel drug-like molecules within protein binding pockets. It uses autoregressive flow modeling with chemical knowledge to generate 100% chemically valid, highly drug-like compounds.

RFdiffusion is a state-of-the-art protein structure generation tool that uses diffusion models to design proteins de novo, create binders, scaffold motifs, and generate symmetric oligomers with atomic precision.

RFdiffusion2 is an atom-level enzyme active site scaffolding tool that generates protein scaffolds around your input motif. REQUIRES an input PDB structure containing the active site residues to scaffold. For ligand-aware design, ligands must be embedded in the input PDB as HETATM records.
ProGen2 is a family of autoregressive protein language models from Salesforce Research, trained to generate novel amino acid sequences by learning the statistical patterns of natural and metagenomic proteins. The family spans four sizes, from 151M to 6.4B parameters, and includes a domain-specific variant trained exclusively on antibody sequences.
Generation works by sampling from the model's learned distribution over protein sequence space, conditioned on an optional context string. Unlike structure-based design tools, ProGen2 operates entirely in sequence space. No input structure is required, and the model can generate proteins from scratch using only a starting token.
ProGen2 on ProteinIQ runs on cloud GPU infrastructure, so no local installation, checkpoint downloads, or Python environment is needed. Enter an optional context string (or leave the default 1 for unconditioned generation), configure the model size and sampling parameters, and receive generated sequences in FASTA format.
| Input | Description |
|---|---|
Context / prompt | Starting token(s) for generation. Default 1 for general proteins; see Context and control tokens below. |
| Setting | Description |
|---|---|
Model | Checkpoint to use. Larger models produce higher-quality sequences but are slower. See Choosing a model. |
Random seed | Seeds the random number generator for reproducible runs. Change to explore different outputs with identical settings. |
Number of sequences | How many sequences to generate per run (1-50). |
| Setting | Description |
|---|---|
Top-p | Nucleus sampling threshold (0.01-1.0, default 0.95). Restricts each sampling step to the smallest set of tokens whose cumulative probability reaches this value. Lower values constrain outputs to more probable amino acids. |
Temperature | Scales the probability distribution before sampling (0.01-2.0, default 0.2). Lower values produce more conservative, natural-looking sequences; higher values increase diversity at the cost of coherence. |
Max generated length | Maximum number of tokens the model generates (1-2048, default 256). Includes the context tokens. The model may stop earlier if it predicts a terminal token. |
Generated sequences are returned as a FASTA file. Each sequence header includes the model name and generated length. A JSON detail file with per-sequence metadata is also available for download.
The context string is the seed passed to the tokenizer before generation begins. It conditions each subsequent token prediction on everything that came before, so it acts both as a prompt and a partial sequence prefix.
The default context 1 is a control token from the training data format. During training, sequences from UniRef90 and BFD databases were prefixed with 1, while metagenomic sequences were prefixed with 2. Providing 1 as context tells the model to generate in the style of UniRef/BFD proteins; providing 2 biases toward metagenomic-style sequences. These tokens are stripped from the output before display, so what the user sees is clean amino acid sequence.
Context can also be a partial protein sequence. Prefixing with 1MKTLL generates continuations from that N-terminal fragment, which is useful for extending sequences in a particular evolutionary neighborhood. For progen2-oas, context from the antibody-framework regions (e.g., heavy-chain variable region prefix) guides generation toward biologically plausible antibody sequences.
| Model | Parameters | Training data | Best for |
|---|---|---|---|
progen2-small | 151M | UniRef90 + BFD30 | Fast iteration, large batch runs |
progen2-medium | 764M | UniRef90 + BFD30 | Balanced quality/speed |
progen2-base | 764M | UniRef90 + BFD30 | Same as medium (different fine-tuning) |
progen2-large | 2.7B | UniRef90 + BFD30 | Default; good quality with reasonable runtime |
progen2-BFD90 | 2.7B | BFD90 | Broader sequence diversity, metagenomic coverage |
For most general protein generation tasks, progen2-large (the default) is a reasonable starting point. If the goal is antibody sequence generation, progen2-oas is the appropriate choice because it was trained on immune repertoire data from the Observed Antibody Space (OAS) database rather than general protein databases.
Temperature and top-p both control how conservatively the model samples at each position. At the paper's default settings (t=0.2, p=0.95), ProGen2 tends to produce sequences that resemble natural proteins in amino acid composition and secondary structure propensity. Raising temperature above 0.5 increases sequence diversity but can produce unusual amino acid patterns; temperatures above 1.0 are rarely useful for most design applications.
Generating multiple sequences at different random seeds, rather than adjusting temperature, is often a better way to explore sequence space while maintaining output quality.
ProGen2 generates sequences from nothing but a statistical model of protein sequence space. This makes it distinct from several related approaches:
progen2-oas is an alternative starting point.The practical advantage of ProGen2 is simplicity: no receptor structure, no scaffold, no MSA. The limitation is the same: generated sequences are sampled from a distribution and have no guarantee of folding into a functional structure without downstream validation.
progen2-xlarge| 6.4B |
| UniRef90 + BFD30 |
| Highest quality; significantly slower |
progen2-oas | 764M | Observed Antibody Space | Antibody and nanobody sequence generation |