Related tools

FastTree
Infer approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences.

IQ-TREE
Build phylogenetic trees using maximum likelihood with automatic model selection (ModelFinder) and ultrafast bootstrap support.

MAFFT
Perform multiple sequence alignment using MAFFT (Multiple Alignment using Fast Fourier Transform). Supports multiple algorithms from fast progressive to highly accurate iterative methods.

MUSCLE5
Perform multiple sequence alignment using MUSCLE5 (MUltiple Sequence Comparison by Log-Expectation). Uses the PPP algorithm for high-quality alignments with support for ensemble generation.

USAlign
USAlign (Universal Structure Alignment) aligns protein, RNA, and DNA structures to compute TM-scores and generate superposed structures. Compare 3D structures to assess structural similarity.

MMseqs2
Ultra-fast sequence search and clustering. 10,000x faster than BLAST for database searches, with powerful sequence clustering capabilities for proteins and nucleotides.

MSA Viewer
Interactive viewer for multiple sequence alignments with color-coded residues and consensus sequence

MUMmer4
Rapidly align and compare DNA sequences using MUMmer4 nucmer. Perform pairwise genome comparisons to identify SNPs, indels, and structural variants between reference and query genomes.

RNAalifold
RNAalifold computes consensus RNA secondary structure from a multiple sequence alignment. Uses covariation information to improve prediction accuracy for evolutionarily conserved structures.

Salmon
Quantify transcript abundance from RNA-seq reads with Salmon selective alignment. Upload a transcript FASTA reference plus single-end or paired-end FASTA/FASTQ reads to produce TPM and estimated read-count tables.
What is Clustal Omega?
Clustal Omega is a multiple sequence alignment (MSA) program that aligns protein or nucleotide sequences to reveal conserved regions, evolutionary relationships, and functional motifs. MSA is a foundational step in many bioinformatics workflows—from phylogenetic analysis to structure prediction to primer design.
Clustal Omega can handle datasets ranging from a handful of sequences to tens of thousands, making it suitable for both focused studies and large-scale comparative genomics. Once you have an alignment, you can use FastTree to build a phylogenetic tree from it.
How does Clustal Omega work?
Clustal Omega uses a progressive alignment strategy: it first estimates how similar sequences are to each other, builds a guide tree from those similarities, then aligns sequences following the tree order. The key innovations are the mBed algorithm for scalability and HMM-based profile alignment for accuracy.
The mBed algorithm
Traditional pairwise distance calculation scales as , which becomes prohibitive for large datasets. The mBed algorithm reduces this to by "embedding" each sequence into a low-dimensional space.
Instead of comparing every sequence to every other sequence, mBed selects a small set of reference sequences and represents each sequence as a vector of distances to these references. These vectors can be clustered rapidly using k-means, with clusters capped at 100 sequences. Full distance matrices are only computed within clusters, not across the entire dataset.
HMM-based profile alignment
When combining two groups of aligned sequences (profiles), Clustal Omega uses hidden Markov model alignment via the HHalign package. Each profile is converted to an HMM with match, insert, and delete states. Aligning two HMMs rather than simple position-specific scoring matrices improves sensitivity for distantly related sequences.
Guide tree and progressive alignment
The distance matrix (partial or full) is used to construct a guide tree via UPGMA. This tree determines the order of pairwise alignments: closely related sequences are aligned first, then progressively merged with more distant groups until all sequences are incorporated.
Alignment settings
Sequence type
Clustal Omega auto-detects whether your sequences are protein or nucleotide by examining character composition. Manual selection (Protein, DNA, or RNA) is useful when auto-detection might be ambiguous—for example, with very short sequences or sequences containing unusual characters.
Output format
- FASTA: Aligned sequences with gaps represented as
-. Most compatible with downstream tools including FastTree. - Clustal: Traditional format showing alignment blocks with conservation symbols. Good for visual inspection.
- Phylip: Fixed-width format used by phylogenetic programs.
- MSF: GCG format, useful for legacy software.
- Stockholm: Annotated format used by Pfam and Rfam databases.
Refinement iterations
After the initial alignment, Clustal Omega can refine it by rebuilding the guide tree from the alignment itself (rather than pairwise distances) and realigning. Each iteration uses the improved alignment to construct a better guide tree.
We recommend 1-2 iterations for important alignments where accuracy matters more than speed. For exploratory work or very large datasets, skip refinement (0 iterations) to save time.
Full distance matrix
By default, mBed calculates a reduced distance matrix for scalability. Enabling the full distance matrix computes all pairwise distances, which produces more accurate guide trees at the cost of complexity.
Use the full matrix for datasets under ~1,000 sequences where alignment quality is critical. For larger datasets, stick with mBed—the accuracy loss is typically minimal.
Output order
Controls whether sequences in the output follow your original input order or are reordered by the guide tree. Tree order groups similar sequences together, which can make visual inspection easier and is the natural order for phylogenetic workflows. Input order (the default) preserves whatever order you provided.
Remove existing alignment
When enabled, Clustal Omega strips all gap characters (- and .) from your input before aligning. This is useful when you paste sequences that are already aligned—perhaps from a previous alignment or a database—and want Clustal Omega to start fresh rather than treating the existing gaps as meaningful.
Output distance matrix
Produces an additional output file containing the pairwise distance matrix computed during alignment. Each entry represents the estimated evolutionary distance between two sequences. This matrix can be used for hierarchical clustering, neighbor-joining trees, or as input to other phylogenetic tools.
Output guide tree
Produces an additional Newick-format tree file showing the guide tree used to order the progressive alignment. This is the UPGMA tree built from pairwise distances (or mBed-approximated distances). It is not a phylogenetic tree in the strict sense—it is an operational tree that determined alignment order—but it provides a quick overview of sequence relationships.
Understanding the results
The output is a multiple sequence alignment where:
- Columns represent homologous positions across sequences
- Gap characters (
-) indicate insertions or deletions relative to other sequences - Conserved columns (same residue across all sequences) suggest functional or structural importance
The alignment length reported is the number of columns, which will be longer than any individual sequence due to gaps. High-quality alignments have fewer scattered gaps and more continuous aligned blocks.
Common workflows
Clustal Omega is typically the first step in a multi-tool pipeline:
- Phylogenetic analysis: Align sequences with Clustal Omega → Build tree with FastTree
- Conservation analysis: Align sequences → Identify conserved regions for mutagenesis targets
- Homology modeling: Align target to templates → Use alignment for structure prediction
- Primer design: Align variants → Design primers in conserved regions
Limitations
Clustal Omega assumes sequences are homologous and alignable. It will produce an alignment even for unrelated sequences, but the result will be meaningless. Always verify that your sequences share evolutionary or functional relationships before aligning.
Very divergent sequences (below ~20% identity for proteins) may not align reliably with any progressive alignment method. Consider structure-based alignment for such cases.
