
ProGen2 is Salesforce Research's protein language model suite for prompt-based de novo protein sequence generation. It samples novel amino acid sequences from a plain-text context string using top-p sampling and temperature control.

All-atom generative AI for designing protein binders. Specify target binding sites and generate diverse binding proteins with fine-grained control over interaction parameters.

Design linear peptide binders for target proteins using a target sequence-conditioned masked language model. PepMLM generates peptide sequences optimized to bind specific protein targets based on ESM-2 protein language modeling.

EvoDiff is a diffusion-based protein sequence generation framework from Microsoft Research. ProteinIQ currently wraps the EvoDiff-Seq OA_DM_38M model for unconditional protein generation, motif scaffolding, and user-sequence inpainting.

BoltzGen is a state-of-the-art AI model for designing protein and peptide binders against any biomolecular target. Using generative diffusion models, it creates novel binders (proteins, peptides, nanobodies) with nanomolar-level binding affinity.

PocketFlow is a structure-based molecular generative model that designs novel drug-like molecules within protein binding pockets. It uses autoregressive flow modeling with chemical knowledge to generate 100% chemically valid, highly drug-like compounds.
ProFam-1 is a protein family language model for generating new sequences from a set of related proteins. Instead of designing from structure alone, it conditions on family context, which makes it useful for proposing variants that remain consistent with an existing evolutionary neighborhood while still exploring new sequence space.
The model was introduced for two closely related tasks: family-conditioned design and family-aware fitness prediction. In practice, the same likelihood signal that helps rank generated sequences can also be used to compare candidate variants within a protein family.
ProteinIQ provides browser-based access to ProFam, so protein family-conditioned sequence generation can be run without installing the original repository, downloading checkpoints, or preparing a local Python environment.
The tool accepts pasted or uploaded family sequence sets and returns generated candidates in spreadsheet-ready output.
| Input | Description |
|---|---|
Protein family sequences (FASTA/MSA) | One family prompt containing one or more related protein sequences. FASTA, A3M, ALN, MSA-style text, and plain text uploads are accepted. Headers are preserved when present. |
Job name | Optional label for organizing runs in ProteinIQ job history. |
| Setting | Description |
|---|---|
Number of sequences | Number of candidates to generate (1-200, default 16 in the ProteinIQ UI). Higher values broaden sampling but increase run time and output volume. |
Sampling temperature | Diversity control (0.1-2.0, default 1.0). Lower values bias generation toward high-probability family-like sequences; higher values increase novelty. |
Nucleus sampling (top-p) | Restricts sampling to the smallest token set whose cumulative probability reaches the chosen threshold (0.5-1.0, default 0.95). Lower values are more conservative. |
Maximum sequence length | Upper bound for generated length in residues (32-2048, default 512). Generated sequences may terminate earlier. |
ProteinIQ returns one row per generated sequence.
| Column | Description |
|---|---|
sequence_id | FASTA-style identifier emitted by the generation run. The header can include the model score used for ranking. |
generated_sequence | Generated amino acid sequence. |
length | Sequence length in residues. |
log_likelihood | Sequence-level log-likelihood parsed from the generated FASTA header when available. Higher values are generally more consistent with the model's learned family distribution. |
Downloadable output makes it straightforward to sort candidates by score, length, or downstream screening results.
ProFam belongs to the class of protein language models, but it is trained at the level of protein families rather than isolated sequences. The input prompt is a set of homologous proteins, and the model learns to generate sequences that are plausible under that shared family context.
The central idea is that protein function and fitness are often easier to model when a sequence is viewed relative to its family. Conserved motifs, tolerated substitutions, and family-specific residue patterns are represented implicitly in the prompt. That gives ProFam a different operating point from unconditional generators, which sample from a broad protein distribution without a family anchor.
The released ProFam-1 model generates sequences autoregressively, predicting the next residue from the previously generated residues and the supplied family context. Sampling controls such as temperature and top-p govern the tradeoff between conservative generation and broader exploration.
ProteinIQ accepts FASTA, A3M, and related alignment-style text because many family datasets already exist in those formats. Even so, ProFam inference is ultimately performed on sequence text rather than gap characters: alignment gaps are useful for preparing the family prompt, but the generated outputs are ordinary ungapped amino acid sequences.
log_likelihood output is a model score, not a direct experimental measurement of expression, stability, catalytic activity, or binding affinity.