
Align multiple protein or nucleotide sequences using the Clustal Omega algorithm with support for various output formats.
Clustal Omega is a multiple sequence alignment (MSA) program that aligns protein or nucleotide sequences to reveal conserved regions, evolutionary relationships, and functional motifs. MSA is a foundational step in many bioinformatics workflows—from phylogenetic analysis to structure prediction to primer design.
Clustal Omega can handle datasets ranging from a handful of sequences to tens of thousands, making it suitable for both focused studies and large-scale comparative genomics. Once you have an alignment, you can use FastTree to build a phylogenetic tree from it.
Clustal Omega uses a progressive alignment strategy: it first estimates how similar sequences are to each other, builds a guide tree from those similarities, then aligns sequences following the tree order. The key innovations are the mBed algorithm for scalability and HMM-based profile alignment for accuracy.
Traditional pairwise distance calculation scales as O(N2), which becomes prohibitive for large datasets. The mBed algorithm reduces this to O(NlogN) by "embedding" each sequence into a low-dimensional space.
Instead of comparing every sequence to every other sequence, mBed selects a small set of reference sequences and represents each sequence as a vector of distances to these references. These vectors can be clustered rapidly using k-means, with clusters capped at 100 sequences. Full distance matrices are only computed within clusters, not across the entire dataset.
When combining two groups of aligned sequences (profiles), Clustal Omega uses hidden Markov model alignment via the HHalign package. Each profile is converted to an HMM with match, insert, and delete states. Aligning two HMMs rather than simple position-specific scoring matrices improves sensitivity for distantly related sequences.
The distance matrix (partial or full) is used to construct a guide tree via UPGMA. This tree determines the order of pairwise alignments: closely related sequences are aligned first, then progressively merged with more distant groups until all sequences are incorporated.
Clustal Omega auto-detects whether your sequences are protein or nucleotide by examining character composition. Manual selection (Protein, DNA, or RNA) is useful when auto-detection might be ambiguous—for example, with very short sequences or sequences containing unusual characters.
-. Most compatible with downstream tools including FastTree.After the initial alignment, Clustal Omega can refine it by rebuilding the guide tree from the alignment itself (rather than pairwise distances) and realigning. Each iteration uses the improved alignment to construct a better guide tree.
We recommend 1-2 iterations for important alignments where accuracy matters more than speed. For exploratory work or very large datasets, skip refinement (0 iterations) to save time.
By default, mBed calculates a reduced distance matrix for scalability. Enabling the full distance matrix computes all pairwise distances, which produces more accurate guide trees at the cost of O(N2) complexity.
Use the full matrix for datasets under ~1,000 sequences where alignment quality is critical. For larger datasets, stick with mBed—the accuracy loss is typically minimal.
The output is a multiple sequence alignment where:
-) indicate insertions or deletions relative to other sequencesThe alignment length reported is the number of columns, which will be longer than any individual sequence due to gaps. High-quality alignments have fewer scattered gaps and more continuous aligned blocks.
Clustal Omega is typically the first step in a multi-tool pipeline:
Clustal Omega assumes sequences are homologous and alignable. It will produce an alignment even for unrelated sequences, but the result will be meaningless. Always verify that your sequences share evolutionary or functional relationships before aligning.
Very divergent sequences (below ~20% identity for proteins) may not align reliably with any progressive alignment method. Consider structure-based alignment for such cases.