
Upload 2 structures for detailed comparison or 3+ for clustering. Automatic mode detection.
FoldSeek is a fast protein structure search and clustering tool that compares 3D structures to find structural similarities. It can cluster multiple structures by similarity or compare two structures in detail, producing metrics like TM-score and LDDT.
Traditional structure comparison methods like TM-align are accurate but slow, requiring seconds per comparison. FoldSeek achieves comparable sensitivity while being four to five orders of magnitude faster. This speed comes from a novel encoding approach that converts 3D coordinates into searchable sequences.
For sequence-based clustering, use MMseqs2. For detailed pairwise structure alignment with superposition, use USAlign.
FoldSeek's speed comes from the 3Di (3D interaction) alphabet, which encodes protein structure as a sequence of 20 letters. Unlike traditional backbone structural alphabets, 3Di describes the geometric relationship between each residue and its spatially closest neighbor.
For each residue i, FoldSeek finds its nearest neighbor residue j based on virtual center distance. Seven angles, the Cα distance, and two sequence distance features are extracted from the backbone coordinates of both residues. These 10 features define the 20 3Di states through a neural network trained to maximize evolutionary conservation.
This encoding has three advantages over backbone alphabets: weaker dependency between consecutive letters, more evenly distributed state frequencies, and higher information density in conserved protein cores rather than loop regions.
FoldSeek converts both query and target structures into 3Di sequences. It then applies the MMseqs2 prefilter to find candidate matches using spaced k-mer matching on diagonals of the alignment matrix. This prefilter reduces the search space by several orders of magnitude while maintaining high sensitivity.
For hits passing the prefilter, FoldSeek performs Smith-Waterman local alignment combining both 3Di and amino acid substitution scores. The final alignment uses structural superposition to calculate TM-score and LDDT.
FoldSeek supports three alignment algorithms:
3Di + Sequence: Combines structural and sequence information. We recommend this for most use cases as it balances speed and accuracy.TMalign: Pure structural alignment using the TM-align algorithm. Slower but may find more distant structural similarities.LoLalign: Long Loop alignment method, designed for structures with significant loop variations.0.001) are more stringent and return only confident matches. Increase to find more distant structural relationships.0 to cluster purely by structure.0.5 groups structures with the same fold.TM-score (Template Modeling score) measures global structural similarity on a scale of 0 to 1:
| TM-score | Interpretation |
|---|---|
| < 0.17 | Random, unrelated structures |
| 0.17 - 0.5 | Some structural similarity |
| > 0.5 | Same fold |
| 1.0 | Identical structures |
A TM-score of 0.5 is the widely accepted threshold for determining whether two proteins share the same fold. Below 0.17, structures are statistically indistinguishable from random pairs.
LDDT (Local Distance Difference Test) evaluates local structural accuracy without requiring superposition. It compares interatomic distances rather than absolute positions, making it robust to domain movements.
| LDDT | Interpretation |
|---|---|
| > 0.9 | Excellent local agreement |
| 0.7 - 0.9 | Good local structure |
| 0.5 - 0.7 | Moderate agreement |
| < 0.5 | Poor local similarity |
LDDT is particularly useful for multi-domain proteins where global superposition may be misleading.
The fraction of aligned positions with identical amino acids. High sequence identity with low structural similarity may indicate conformational changes. Low sequence identity with high TM-score indicates structural conservation despite sequence divergence.
Structure comparison serves many purposes in structural biology:
FoldSeek excels at finding structural similarity but has some constraints: