FoldSeek

Search AlphaFold DB, compare structures, or cluster by 3D similarity

Input

Job name

Protein Structures

Click or drag files to upload (.pdb, .ent, .cif, .mmcif)

Mode

Databases to search

AlphaFold DB 50

PDB

AlphaFold Swiss-Prot

CATH 50

Alignment type

Sensitivity (E-value)

Min sequence identity

TM-score threshold

LDDT threshold

50 credits

Output

Configure input settings, then click "Run FoldSeek"

What is FoldSeek?

FoldSeek is a fast protein structure search tool that can search your structure against 200+ million predicted structures in the AlphaFold Database, compare structures in detail, or cluster multiple structures by similarity.

Traditional structure comparison methods like TM-align are accurate but slow, requiring seconds per comparison. FoldSeek achieves comparable sensitivity while being four to five orders of magnitude faster. This speed comes from a novel encoding approach that converts 3D coordinates into searchable sequences.

For sequence-based clustering, use MMseqs2. For detailed pairwise structure alignment with superposition, use USAlign.

Database search

Upload a single structure to search against massive structure databases. Database search uses the FoldSeek web server API, giving you access to the same search capabilities as search.foldseek.com.

Available databases:

Database	Contents	Size
AlphaFold DB 50	Clustered AlphaFold predictions	~50M representatives
PDB	Experimental structures from Protein Data Bank	~220K structures
AlphaFold Swiss-Prot	High-quality curated AlphaFold predictions	~500K structures
CATH 50	Protein domain database (Class, Architecture, Topology, Homology)	~30K domains

By default, AlphaFold DB 50 and PDB are searched. You can enable or disable individual databases in the settings. Database search typically completes in 1-5 minutes depending on server load.

How does FoldSeek work?

The 3Di structural alphabet

FoldSeek's speed comes from the 3Di (3D interaction) alphabet, which encodes protein structure as a sequence of 20 letters. Unlike traditional backbone structural alphabets, 3Di describes the geometric relationship between each residue and its spatially closest neighbor.

For each residue $i$ , FoldSeek finds its nearest neighbor residue $j$ based on virtual center distance. Seven angles, the $C_\alpha$ distance, and two sequence distance features are extracted from the backbone coordinates of both residues. These 10 features define the 20 3Di states through a neural network trained to maximize evolutionary conservation.

This encoding has three advantages over backbone alphabets: weaker dependency between consecutive letters, more evenly distributed state frequencies, and higher information density in conserved protein cores rather than loop regions.

Search algorithm

FoldSeek converts both query and target structures into 3Di sequences. It then applies the MMseqs2 prefilter to find candidate matches using spaced k-mer matching on diagonals of the alignment matrix. This prefilter reduces the search space by several orders of magnitude while maintaining high sensitivity.

For hits passing the prefilter, FoldSeek performs Smith-Waterman local alignment combining both 3Di and amino acid substitution scores. The final alignment uses structural superposition to calculate TM-score and LDDT.

Inputs & settings

Mode

FoldSeek automatically detects the appropriate mode based on how many structures you upload:

Structures	Mode	Description
1	Database search	Search against AlphaFold DB, PDB, and other databases
2	Pairwise comparison	Detailed comparison with TM-score, LDDT, alignment metrics
3+	Clustering	Group structures by similarity

You can also explicitly select a mode:

Auto-detect: Let FoldSeek choose based on structure count (recommended)
Database search: Force search against public databases even with multiple structures
Local: Force local comparison/clustering, skip database search

Alignment type

FoldSeek supports two alignment algorithms:

3Di + Sequence: Combines structural and sequence information. Recommended for most use cases as it balances speed and accuracy.
TMalign: Pure structural alignment using the TM-align algorithm. Slower but may find more distant structural similarities.

Database selection

When using database search mode, you can select which databases to search:

AlphaFold DB 50: Recommended for comprehensive coverage. Clustered at 50% sequence identity to balance speed and completeness.
PDB: Essential for finding experimentally validated structural matches.
AlphaFold Swiss-Prot: High-quality subset focusing on well-characterized proteins. Useful when you want curated predictions only.
CATH 50: Specialized domain-focused database organized by structural classification. Best for domain-level comparisons.

Local mode thresholds

These settings apply only to local comparison and clustering modes:

Sensitivity (E-value): Controls how stringent the search is. Lower values (e.g., 0.001) are more stringent and return only confident matches. Increase to find more distant structural relationships.
Min sequence identity: Minimum amino acid sequence identity required for clustering. Set to 0 to cluster purely by structure.
TM-score threshold: Minimum TM-score for clustering. A threshold of 0.5 groups structures with the same fold.
LDDT threshold: Minimum LDDT for clustering. Higher values require more similar local geometry.

Understanding the results

FoldSeek returns different metrics depending on the mode. Database search provides probability scores and E-values, while local comparison provides detailed structural metrics.

Database search results

When searching against AlphaFold DB or PDB, results include:

Probability: Confidence score from 0 to 1 indicating match quality. Higher values represent more confident structural matches. This is distinct from TM-score and is calculated by the FoldSeek search algorithm.
E-value: Expectation value representing the number of hits with equal or better scores expected by chance. Lower E-values indicate more significant matches. Values below 0.001 are typically considered confident hits.
Identity %: Percentage of aligned residues with identical amino acids. This shows sequence conservation in addition to structural similarity.
Alignment length: Number of residues aligned between query and target structures.

Local comparison results

When comparing structures locally (pairwise or clustering mode), FoldSeek calculates:

TM-score (Template Modeling score) measures global structural similarity on a scale of 0 to 1:

TM-score	Interpretation
< 0.17	Random, unrelated structures
0.17 - 0.5	Some structural similarity
> 0.5	Same fold
1.0	Identical structures

A TM-score of 0.5 is the widely accepted threshold for determining whether two proteins share the same fold. Below 0.17, structures are statistically indistinguishable from random pairs.

LDDT (Local Distance Difference Test) evaluates local structural accuracy without requiring superposition. It compares interatomic distances rather than absolute positions, making it robust to domain movements.

LDDT	Interpretation
> 0.9	Excellent local agreement
0.7 - 0.9	Good local structure
0.5 - 0.7	Moderate agreement
< 0.5	Poor local similarity

LDDT is particularly useful for multi-domain proteins where global superposition may be misleading.

Sequence identity: The fraction of aligned positions with identical amino acids. High sequence identity with low structural similarity may indicate conformational changes. Low sequence identity with high TM-score indicates structural conservation despite sequence divergence.

Use cases

Database search

Functional annotation: Find proteins with similar folds to infer function
Evolutionary analysis: Discover distant homologs undetectable by sequence
Template identification: Find templates for homology modeling
Novel fold detection: Check if your structure represents a new fold

Local comparison & clustering

Fold classification: Group structures into families based on 3D similarity
Redundancy removal: Create non-redundant structure datasets for training ML models
Quality assessment: Compare predicted structures to known templates
Conformational analysis: Identify structural changes between states

Limitations

FoldSeek excels at finding structural similarity but has some constraints:

Requires atomic coordinates (PDB or mmCIF format)
3Di encoding may miss some similarities in highly flexible regions
Clustering is greedy and results depend on representative selection
Very short structures (< 30 residues) may produce unreliable scores
Database search depends on the FoldSeek web server and may take 1-5 minutes

USAlign: Detailed pairwise structure alignment with superposition output
MMseqs2: Sequence-based search and clustering
Ramachandran Plot: Validate backbone geometry
PDB Viewer: Visualize protein structures
ESMFold, Boltz-2, Chai-1: Predict structures from sequence

FoldSeek

Input

Output

What is FoldSeek?

Database search

How does FoldSeek work?

The 3Di structural alphabet

Search algorithm

Inputs & settings

Mode

Alignment type

Database selection

Local mode thresholds

Understanding the results

Database search results

Local comparison results

Use cases

Database search

Local comparison & clustering

Limitations

Related tools

Input

Output