MUMmer4

Align and compare whole genomes to detect SNPs, indels, and structural variants.

10
Configure input settings on the left, then click "Align Genomes"

Related tools

MAFFT

MAFFT

Perform multiple sequence alignment using MAFFT (Multiple Alignment using Fast Fourier Transform). Supports multiple algorithms from fast progressive to highly accurate iterative methods.

MUSCLE5

MUSCLE5

Perform multiple sequence alignment using MUSCLE5 (MUltiple Sequence Comparison by Log-Expectation). Uses the PPP algorithm for high-quality alignments with support for ensemble generation.

USAlign

USAlign

USAlign (Universal Structure Alignment) aligns protein, RNA, and DNA structures to compute TM-scores and generate superposed structures. Compare 3D structures to assess structural similarity.

Clustal Omega

Clustal Omega

Perform multiple sequence alignment on protein or nucleotide sequences using the Clustal Omega algorithm.

FastTree

FastTree

Infer approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences.

HMMER

HMMER

Sensitive sequence homology search using profile hidden Markov models. More accurate than BLAST for detecting remote homologs, ideal for finding evolutionarily distant protein family members.

IgBLAST

IgBLAST

Analyze immunoglobulin (antibody) and T cell receptor variable domain sequences. Identifies V/D/J gene segments, delineates CDR regions, and analyzes rearrangement junctions.

IQ-TREE

IQ-TREE

Build phylogenetic trees using maximum likelihood with automatic model selection (ModelFinder) and ultrafast bootstrap support.

MMseqs2

MMseqs2

Ultra-fast sequence search and clustering. 10,000x faster than BLAST for database searches, with powerful sequence clustering capabilities for proteins and nucleotides.

FoldSeek

FoldSeek

Fast protein structure search, comparison, and clustering. Search your structure against 200M+ AlphaFold predictions, compare 2 structures, or cluster up to 2500.

What is MUMmer4?

MUMmer4 is a system for rapidly aligning large DNA sequences to one another. It excels at whole-genome comparisons, identifying structural differences, SNPs, and indels between a reference and query genome. The name "MUMmer" comes from Maximal Unique Matches (MUMs)—the exact sequence matches that anchor alignments.

MUMmer4 can find all 20 base pair maximal exact matches between two bacterial genomes (~5 million base pairs each) in about 20 seconds on a typical desktop computer. It handles everything from draft assemblies with hundreds of contigs to complete chromosomes spanning gigabases.

How does MUMmer4 work?

MUMmer4 uses a seed-and-extend approach via the nucmer algorithm. It first finds exact matching subsequences (anchors), then extends these into longer alignments that tolerate mismatches and gaps.

Suffix arrays for anchor finding

The core data structure is a suffix array, which enables rapid identification of all maximal exact matches between two sequences. MUMmer4 upgraded from a 32-bit suffix tree to a 48-bit suffix array, removing previous size limits. The theoretical limit is now 141 trillion base pairs.

For a match to serve as an anchor, it must be:

  • Maximal: Cannot be extended in either direction
  • Unique: Appears exactly once in both reference and query (in MUM mode)

Clustering and extension

Once anchors are found, MUMmer4 clusters nearby matches that appear in consistent order. Each cluster represents a potential alignment region. The algorithm then extends these clusters by allowing mismatches and gaps, producing the final alignments.

Match modes

  • MUM (default): Uses only matches that are unique in both reference and query. Fastest and most specific, but may miss alignments in repetitive regions.
  • MaxMatch: Uses all maximal matches regardless of uniqueness. Most sensitive for detecting all possible alignments, but slower and may produce spurious matches in repetitive sequences.
  • MUM Reference: Matches must be unique in the reference only. A middle ground that handles repetitive query sequences (like draft assemblies with duplicated contigs).

Alignment settings

Minimum match length

The minimum length of exact matches used as anchors (default: 20 bp). Lower values find more anchors and can detect alignments in divergent regions, but increase computation time and may produce false positive alignments. For closely related genomes (>95% identity), 20 bp works well. For more divergent comparisons, try 15 bp.

Minimum cluster length

Alignments shorter than this threshold are filtered out (default: 65 bp). This removes spurious short alignments that may result from random sequence similarity. Increase this value when comparing genomes with many repetitive elements.

Break length

How far the extension algorithm will look through a region of differences before stopping (default: 200 bp). Larger values can bridge over transposons or other insertions to merge alignments that would otherwise be separate. This is useful for highly rearranged genomes.

Maximum gap

The maximum gap allowed between adjacent matches within a cluster (default: 90 bp). Matches separated by more than this distance start a new cluster. Increase this for genomes with many small indels.

Strand selection

  • Both strands: Align query in both orientations (default). Required for detecting inversions.
  • Forward only: Query is aligned only in the same orientation as reference.
  • Reverse complement only: Query is aligned only in reverse complement orientation.

Output options

Show coordinates

Produces a table of all alignment regions with:

  • Reference and query start/end positions
  • Alignment lengths
  • Percent identity
  • Sequence names

This is the primary output for understanding genome structure and identifying rearrangements.

Extract SNPs

Runs show-snps to identify single nucleotide polymorphisms and small indels from the alignments. Each variant includes:

  • Position in reference and query
  • Reference and query bases (. indicates insertion/deletion)
  • Variant type (SNP or INDEL)

Minimum % identity filter

Excludes alignments below this identity threshold from the coordinates output. Useful for focusing on high-confidence alignments when comparing divergent genomes.

Minimum alignment length filter

Excludes alignments shorter than this value from coordinates output. Helps remove noise from small, potentially spurious matches.

Understanding the results

The dot plot

The dot plot visualizes alignment positions as line segments on a 2D grid where:

  • X-axis: Reference genome position
  • Y-axis: Query genome position
  • Red lines: Forward alignments (query in same orientation as reference)
  • Blue lines: Reverse complement alignments (inversions)

Interpreting patterns:

  • Diagonal line: Syntenic (conserved order) alignment. A perfect match between identical sequences produces a single diagonal from origin to corner.
  • Parallel diagonals: Duplications or repeats in one or both genomes
  • Horizontal offset: Insertion in query relative to reference
  • Vertical offset: Insertion in reference relative to query
  • Blue diagonal: Chromosomal inversion
  • Scattered dots: Either highly rearranged genomes or spurious matches from repetitive sequences

Summary statistics

  • Total alignments: Number of distinct alignment blocks
  • Total aligned bp: Sum of all alignment lengths
  • Average identity: Mean percent identity across alignments
  • SNPs/Indels: Variant counts from show-snps output

Coordinates table

Each row represents one alignment block:

ColumnDescription
Ref Start/EndAlignment boundaries in reference
Query Start/EndAlignment boundaries in query
IdentityPercent sequence identity
Ref TagReference sequence name
Query TagQuery sequence name

Common workflows

Genome assembly validation

Compare your assembly to a reference genome to check for:

  • Misassemblies (unexpected rearrangements in dot plot)
  • Missing regions (gaps in coverage)
  • Collapsed repeats (many-to-one alignments)

Strain comparison

Align closely related bacterial or viral strains to catalog all SNPs and indels. This is faster than read mapping for finished genomes and provides complete variant calls.

Synteny analysis

Identify conserved gene order between species. Diagonal segments in the dot plot represent syntenic blocks; breaks indicate rearrangements during evolution.

Draft assembly scaffolding

Align contigs to a related reference to determine their order and orientation. The coordinates output provides the information needed for scaffolding.

Input requirements

MUMmer4 requires input sequences in FASTA format with headers. Each file should contain one or more sequences:

1>sequence_name optional description2ATGCGATCGATCGATCGATCG...

For comparing two complete genomes, provide each as a single FASTA entry. For draft assemblies, include all contigs in one file with unique headers.

Limitations

MUMmer4 is designed for DNA sequence comparison. For protein-based comparison of divergent genomes, consider using promer (available in the command-line MUMmer distribution) or other tools like MMseqs2.

Very large genomes (mammalian-scale) may require significant memory and compute time. For human-scale comparisons, expect several minutes of processing.

Highly repetitive genomes may produce cluttered dot plots with many parallel lines. Use the minimum cluster length filter to reduce noise, or switch to MUM mode to focus on unique regions.