MUMmer4

Rapidly align and compare whole genomes using MUMmer4 nucmer algorithm

Input

Job name

Reference Genome

Query Genome

10 credits

Output

Configure input settings, then click "Align Genomes"

What is MUMmer4?

MUMmer4 is a system for rapidly aligning large DNA sequences to one another. It excels at whole-genome comparisons, identifying structural differences, SNPs, and indels between a reference and query genome. The name "MUMmer" comes from Maximal Unique Matches (MUMs)—the exact sequence matches that anchor alignments.

MUMmer4 can find all 20 base pair maximal exact matches between two bacterial genomes (~5 million base pairs each) in about 20 seconds on a typical desktop computer. It handles everything from draft assemblies with hundreds of contigs to complete chromosomes spanning gigabases.

How does MUMmer4 work?

MUMmer4 uses a seed-and-extend approach via the nucmer algorithm. It first finds exact matching subsequences (anchors), then extends these into longer alignments that tolerate mismatches and gaps.

Suffix arrays for anchor finding

The core data structure is a suffix array, which enables rapid identification of all maximal exact matches between two sequences. MUMmer4 upgraded from a 32-bit suffix tree to a 48-bit suffix array, removing previous size limits. The theoretical limit is now 141 trillion base pairs.

For a match to serve as an anchor, it must be:

Maximal: Cannot be extended in either direction
Unique: Appears exactly once in both reference and query (in MUM mode)

Clustering and extension

Once anchors are found, MUMmer4 clusters nearby matches that appear in consistent order. Each cluster represents a potential alignment region. The algorithm then extends these clusters by allowing mismatches and gaps, producing the final alignments.

Match modes

MUM (default): Uses only matches that are unique in both reference and query. Fastest and most specific, but may miss alignments in repetitive regions.
MaxMatch: Uses all maximal matches regardless of uniqueness. Most sensitive for detecting all possible alignments, but slower and may produce spurious matches in repetitive sequences.
MUM Reference: Matches must be unique in the reference only. A middle ground that handles repetitive query sequences (like draft assemblies with duplicated contigs).

Alignment settings

Minimum match length

The minimum length of exact matches used as anchors (default: 20 bp). Lower values find more anchors and can detect alignments in divergent regions, but increase computation time and may produce false positive alignments. For closely related genomes (>95% identity), 20 bp works well. For more divergent comparisons, try 15 bp.

Minimum cluster length

Alignments shorter than this threshold are filtered out (default: 65 bp). This removes spurious short alignments that may result from random sequence similarity. Increase this value when comparing genomes with many repetitive elements.

Break length

How far the extension algorithm will look through a region of differences before stopping (default: 200 bp). Larger values can bridge over transposons or other insertions to merge alignments that would otherwise be separate. This is useful for highly rearranged genomes.

Maximum gap

The maximum gap allowed between adjacent matches within a cluster (default: 90 bp). Matches separated by more than this distance start a new cluster. Increase this for genomes with many small indels.

Strand selection

Both strands: Align query in both orientations (default). Required for detecting inversions.
Forward only: Query is aligned only in the same orientation as reference.
Reverse complement only: Query is aligned only in reverse complement orientation.

Output options

Show coordinates

Produces a table of all alignment regions with:

Reference and query start/end positions
Alignment lengths
Percent identity
Sequence names

This is the primary output for understanding genome structure and identifying rearrangements.

Extract SNPs

Runs show-snps to identify single nucleotide polymorphisms and small indels from the alignments. Each variant includes:

Position in reference and query
Reference and query bases (. indicates insertion/deletion)
Variant type (SNP or INDEL)

Minimum % identity filter

Excludes alignments below this identity threshold from the coordinates output. Useful for focusing on high-confidence alignments when comparing divergent genomes.

Minimum alignment length filter

Excludes alignments shorter than this value from coordinates output. Helps remove noise from small, potentially spurious matches.

Understanding the results

The dot plot

The dot plot visualizes alignment positions as line segments on a 2D grid where:

X-axis: Reference genome position
Y-axis: Query genome position
Red lines: Forward alignments (query in same orientation as reference)
Blue lines: Reverse complement alignments (inversions)

Interpreting patterns:

Diagonal line: Syntenic (conserved order) alignment. A perfect match between identical sequences produces a single diagonal from origin to corner.
Parallel diagonals: Duplications or repeats in one or both genomes
Horizontal offset: Insertion in query relative to reference
Vertical offset: Insertion in reference relative to query
Blue diagonal: Chromosomal inversion
Scattered dots: Either highly rearranged genomes or spurious matches from repetitive sequences

Summary statistics

Total alignments: Number of distinct alignment blocks
Total aligned bp: Sum of all alignment lengths
Average identity: Mean percent identity across alignments
SNPs/Indels: Variant counts from show-snps output

Coordinates table

Each row represents one alignment block:

Column	Description
Ref Start/End	Alignment boundaries in reference
Query Start/End	Alignment boundaries in query
Identity	Percent sequence identity
Ref Tag	Reference sequence name
Query Tag	Query sequence name

Common workflows

Genome assembly validation

Compare your assembly to a reference genome to check for:

Misassemblies (unexpected rearrangements in dot plot)
Missing regions (gaps in coverage)
Collapsed repeats (many-to-one alignments)

Strain comparison

Align closely related bacterial or viral strains to catalog all SNPs and indels. This is faster than read mapping for finished genomes and provides complete variant calls.

Synteny analysis

Identify conserved gene order between species. Diagonal segments in the dot plot represent syntenic blocks; breaks indicate rearrangements during evolution.

Draft assembly scaffolding

Align contigs to a related reference to determine their order and orientation. The coordinates output provides the information needed for scaffolding.

Input requirements

MUMmer4 requires input sequences in FASTA format with headers. Each file should contain one or more sequences:

1>sequence_name optional description2ATGCGATCGATCGATCGATCG...

For comparing two complete genomes, provide each as a single FASTA entry. For draft assemblies, include all contigs in one file with unique headers.

Limitations

MUMmer4 is designed for DNA sequence comparison. For protein-based comparison of divergent genomes, consider using promer (available in the command-line MUMmer distribution) or other tools like MMseqs2.

Very large genomes (mammalian-scale) may require significant memory and compute time. For human-scale comparisons, expect several minutes of processing.

Highly repetitive genomes may produce cluttered dot plots with many parallel lines. Use the minimum cluster length filter to reduce noise, or switch to MUM mode to focus on unique regions.

GC Content — Analyze the GC composition of your input sequences
Clustal Omega — Multiple sequence alignment for shorter sequences or proteins
MAFFT — Alternative multiple sequence alignment tool
FastTree — Build phylogenetic trees from aligned sequences

Based on: Marçais G, et al. MUMmer4: A fast and versatile genome alignment system. PLoS Comput Biol. 2018;14(1):e1005944.

MUMmer4

Input

Alignment options

Output options

Output

What is MUMmer4?

How does MUMmer4 work?

Suffix arrays for anchor finding

Clustering and extension

Match modes

Alignment settings

Minimum match length

Minimum cluster length

Break length

Maximum gap

Strand selection

Output options

Show coordinates

Extract SNPs

Minimum % identity filter

Minimum alignment length filter

Understanding the results

The dot plot

Summary statistics

Coordinates table

Common workflows

Genome assembly validation

Strain comparison

Synteny analysis

Draft assembly scaffolding

Input requirements

Limitations

Related tools

Input

Alignment options

Output options

Output

What is MUMmer4?

How does MUMmer4 work?

Suffix arrays for anchor finding

Clustering and extension

Match modes

Alignment settings

Minimum match length

Minimum cluster length

Break length

Maximum gap

Strand selection

Output options

Show coordinates

Extract SNPs

Minimum % identity filter

Minimum alignment length filter

Understanding the results

The dot plot

Summary statistics

Coordinates table

Common workflows

Genome assembly validation

Strain comparison

Synteny analysis

Draft assembly scaffolding

Input requirements

Limitations

Related tools