ProteinIQ
FoldSeek example image

FoldSeek

Upload 2 structures for detailed comparison or 3+ for clustering. Automatic mode detection.

What is FoldSeek?

FoldSeek is a fast protein structure search and clustering tool that compares 3D structures to find structural similarities. It can cluster multiple structures by similarity or compare two structures in detail, producing metrics like TM-score and LDDT.

Traditional structure comparison methods like TM-align are accurate but slow, requiring seconds per comparison. FoldSeek achieves comparable sensitivity while being four to five orders of magnitude faster. This speed comes from a novel encoding approach that converts 3D coordinates into searchable sequences.

For sequence-based clustering, use MMseqs2. For detailed pairwise structure alignment with superposition, use USAlign.

How does FoldSeek work?

The 3Di structural alphabet

FoldSeek's speed comes from the 3Di (3D interaction) alphabet, which encodes protein structure as a sequence of 20 letters. Unlike traditional backbone structural alphabets, 3Di describes the geometric relationship between each residue and its spatially closest neighbor.

For each residue ii, FoldSeek finds its nearest neighbor residue jj based on virtual center distance. Seven angles, the CαC_\alpha distance, and two sequence distance features are extracted from the backbone coordinates of both residues. These 10 features define the 20 3Di states through a neural network trained to maximize evolutionary conservation.

This encoding has three advantages over backbone alphabets: weaker dependency between consecutive letters, more evenly distributed state frequencies, and higher information density in conserved protein cores rather than loop regions.

Search algorithm

FoldSeek converts both query and target structures into 3Di sequences. It then applies the MMseqs2 prefilter to find candidate matches using spaced k-mer matching on diagonals of the alignment matrix. This prefilter reduces the search space by several orders of magnitude while maintaining high sensitivity.

For hits passing the prefilter, FoldSeek performs Smith-Waterman local alignment combining both 3Di and amino acid substitution scores. The final alignment uses structural superposition to calculate TM-score and LDDT.

Inputs & settings

Mode

  • Cluster: Groups multiple structures by similarity. Requires 2+ structures. Produces clusters where each structure is assigned to a representative.
  • Pairwise: Compares exactly two structures with detailed alignment metrics. Use this when you want comprehensive comparison statistics.

Alignment type

FoldSeek supports three alignment algorithms:

  • 3Di + Sequence: Combines structural and sequence information. We recommend this for most use cases as it balances speed and accuracy.
  • TMalign: Pure structural alignment using the TM-align algorithm. Slower but may find more distant structural similarities.
  • LoLalign: Long Loop alignment method, designed for structures with significant loop variations.

Thresholds

  • Sensitivity (E-value): Controls how stringent the search is. Lower values (e.g., 0.001) are more stringent and return only confident matches. Increase to find more distant structural relationships.
  • Min sequence identity: Minimum amino acid sequence identity required for clustering. Set to 0 to cluster purely by structure.
  • TM-score threshold: Minimum TM-score for clustering. A threshold of 0.5 groups structures with the same fold.
  • LDDT threshold: Minimum LDDT for clustering. Higher values require more similar local geometry.

Understanding the results

TM-score

TM-score (Template Modeling score) measures global structural similarity on a scale of 0 to 1:

TM-scoreInterpretation
< 0.17Random, unrelated structures
0.17 - 0.5Some structural similarity
> 0.5Same fold
1.0Identical structures

A TM-score of 0.5 is the widely accepted threshold for determining whether two proteins share the same fold. Below 0.17, structures are statistically indistinguishable from random pairs.

LDDT

LDDT (Local Distance Difference Test) evaluates local structural accuracy without requiring superposition. It compares interatomic distances rather than absolute positions, making it robust to domain movements.

LDDTInterpretation
> 0.9Excellent local agreement
0.7 - 0.9Good local structure
0.5 - 0.7Moderate agreement
< 0.5Poor local similarity

LDDT is particularly useful for multi-domain proteins where global superposition may be misleading.

Sequence identity

The fraction of aligned positions with identical amino acids. High sequence identity with low structural similarity may indicate conformational changes. Low sequence identity with high TM-score indicates structural conservation despite sequence divergence.

Use cases

Structure comparison serves many purposes in structural biology:

  • Fold classification: Group structures into families based on 3D similarity
  • Redundancy removal: Create non-redundant structure datasets for training ML models
  • Evolutionary analysis: Find structurally conserved proteins across species
  • Quality assessment: Compare predicted structures to known templates
  • Conformational analysis: Identify structural changes between states

Limitations

FoldSeek excels at finding structural similarity but has some constraints:

  • Requires atomic coordinates (PDB or mmCIF format)
  • 3Di encoding may miss some similarities in highly flexible regions
  • Clustering is greedy and results depend on representative selection
  • Very short structures (< 30 residues) may produce unreliable scores