DNA Shuffle

Shuffle DNA sequences while preserving nucleotide, dinucleotide, or k-mer composition.

Related tools

DNA mutator

DNA mutator

Generate batches of mutated DNA variants from one or more FASTA sequences. Create substitution, insertion, deletion, or mixed variant libraries with reproducible settings.

DNAGenIQ - Random DNA sequence generator

DNAGenIQ - Random DNA sequence generator

Generate random DNA sequences with customizable length, GC content, and restriction sites for molecular cloning and testing purposes.

ProtGenIQ - Random protein sequence generator

ProtGenIQ - Random protein sequence generator

Generate random protein sequences with customizable length, composition, and amino acid properties

RNAGenIQ - Random RNA sequence generator

RNAGenIQ - Random RNA sequence generator

Generate random RNA sequences with customizable types and structural features

Filter DNA

Filter DNA

Clean and filter DNA sequences by removing or replacing non-standard nucleotide characters. Supports multiple filter modes including standard 4 bases, IUPAC ambiguity codes, and custom character sets.

GenBank Feature Extractor

GenBank Feature Extractor

Extract sequence features (CDS, mRNA, gene, etc.) from GenBank files in FASTA format with support for spliced features

Reverse complement generator

Reverse complement generator

Generate reverse, complement, or reverse-complement of DNA/RNA sequences

CSV to FASTA

CSV to FASTA

Convert CSV and TSV files containing sequence data to FASTA format with flexible column mapping and automatic delimiter detection

DNA to Protein Converter

DNA to Protein Converter

Translate DNA sequences to protein sequences using genetic code

DNA to RNA converter

DNA to RNA converter

Convert DNA sequences to RNA (transcription) - replaces T with U

What is DNA Shuffle?

DNA Shuffle generates randomized DNA sequences that preserve specific compositional properties of the original sequence. Three shuffling methods are available: mononucleotide shuffling preserves exact nucleotide counts (A, T, C, G), dinucleotide shuffling preserves all 16 dinucleotide frequencies, and k-mer shuffling preserves frequencies of longer subsequences.

Shuffled sequences serve as statistical controls in bioinformatics analyses. When testing whether a sequence property (such as predicted secondary structure stability or regulatory motif enrichment) is significant, comparing against randomized sequences with matching composition provides a null distribution for hypothesis testing.

How to use DNA Shuffle online

ProteinIQ runs DNA shuffling directly in the browser with instant results, no installation or account required.

Input

InputDescription
DNA sequencesOne or more sequences in FASTA format, or a raw sequence without headers. Only A, T, C, G nucleotides are accepted.

Settings

Shuffle options

SettingDescription
Shuffle methodAlgorithm for randomization. Mononucleotide (default) preserves single nucleotide counts. Dinucleotide preserves all 16 dinucleotide frequencies. K-mer preserves frequencies of specified k-mer length.
K-mer sizeSize of k-mers to preserve (2–6), only used with K-mer method. Larger values constrain the shuffle more heavily.
Number of shufflesHow many randomized sequences to generate per input (1–100). Multiple shuffles provide replicates for statistical analyses.
Random seedSeed value for reproducibility (0 = random seed). Setting a specific seed ensures identical output across runs.

Output formatting

SettingDescription
Output caseUppercase (default) or Lowercase for output sequences.
Add suffix to headersAppends _shuffled or _shuffled_N to FASTA headers. Enabled by default.
Line lengthCharacters per line in output (0–200, default 80). Set to 0 for no line wrapping.

Output

FASTA-formatted sequences with shuffled nucleotide order. When generating multiple shuffles per input, each receives a numbered suffix.

How DNA Shuffle works

Mononucleotide shuffling

The simplest method uses the Fisher-Yates algorithm to randomly permute all nucleotides. The result has identical nucleotide counts but completely randomized order, destroying any dinucleotide or higher-order patterns.

Dinucleotide shuffling

Preserving dinucleotide frequencies requires the Altschul-Erickson algorithm, which models the sequence as a directed graph. Each nucleotide (A, T, C, G) becomes a vertex, and each dinucleotide in the sequence becomes a directed edge. The shuffled sequence is reconstructed by finding a random Eulerian path through this graph—a path that traverses each edge exactly once.

Because the graph preserves all dinucleotide transitions from the original sequence, the shuffled output maintains the same dinucleotide composition. This matters for RNA folding analyses where stacking energies depend on adjacent base pairs.

K-mer shuffling

The generalized Euler algorithm extends dinucleotide shuffling to arbitrary k-mer sizes. Instead of single nucleotides as vertices, the graph uses (k-1)-mers. Each k-mer in the original sequence creates an edge between its prefix and suffix (k-1)-mers. Finding an Eulerian path through this graph produces a sequence preserving all k-mer frequencies.

Larger k values impose stronger constraints. With k=6, the shuffled sequence maintains the same hexanucleotide composition as the original, which may be important when codon usage or restriction site patterns need preservation.

Applications

Shuffled sequences commonly serve as negative controls for:

  • Motif discovery: Testing whether identified patterns occur more frequently than expected by chance
  • RNA structure prediction: Determining if predicted folding stability exceeds that of composition-matched random sequences
  • Regulatory element analysis: Validating that putative binding sites show genuine enrichment
  • Alignment scoring: Establishing background distributions for sequence similarity statistics

Dinucleotide shuffling is particularly important for RNA analyses because secondary structure free energies depend heavily on stacking interactions between adjacent bases. Mononucleotide-shuffled controls may have systematically different folding energies simply due to altered dinucleotide composition.