ProteinIQ

RNAalifold

Consensus RNA structure from alignment

What is RNAalifold?

RNAalifold predicts consensus RNA secondary structure from multiple sequence alignments. Unlike single-sequence folding methods that rely solely on thermodynamic stability, RNAalifold incorporates covariation information from evolutionary data. When two alignment columns show compensatory mutations that preserve base pairing across species, this provides strong evidence that the positions are structurally paired in vivo.

The algorithm averages energy contributions across all aligned sequences while adding covariance bonuses for positions where mutations maintain Watson-Crick or wobble pairing. This combination of thermodynamics and phylogenetic signal yields more accurate structure predictions than either approach alone, particularly for well-conserved non-coding RNAs like riboswitches, ribozymes, and regulatory elements.

How does RNAalifold work?

RNAalifold extends the standard dynamic programming algorithms for RNA secondary structure prediction by computing an averaged free energy over all sequences in the alignment, then modifying this energy with a covariation score for each potential base pair.

Covariation scoring

The original implementation used a simple scoring scheme: +1 kcal/mol bonus for consistent mutations (e.g., A-U to G-U), +2 kcal/mol for compensatory mutations (e.g., A-U to G-C), and -2 kcal/mol penalty for inconsistent pairs. This has been replaced by RIBOSUM-like scoring matrices derived from structural alignments of ribosomal RNAs.

RIBOSUM matrices capture the empirical substitution rates between base pair types in evolutionarily related sequences. They assign higher scores to mutations that preserve base pairing and lower scores to changes that disrupt it, weighted by how frequently such changes occur in real data.

Adaptive weighting

The relative contribution of covariation versus thermodynamic energy adapts to alignment characteristics. At high sequence identity, the covariance term is weighted more heavily because fewer mutations provide less statistical power. At lower identity levels, covariation signals become more informative and the balance shifts accordingly.

Gap handling

Alignment columns containing gaps require special treatment. The improved algorithm handles gaps more rationally than earlier versions, preventing artifacts like unrealistically short hairpins that could arise when gap-containing columns were simply ignored.

How to use RNAalifold online

ProteinIQ provides browser-based access to RNAalifold, running computations on cloud infrastructure without local installation.

Input

FieldDescription
Aligned RNA SequencesMultiple sequence alignment in FASTA, Clustal, or plain text format. Sequences must be pre-aligned and of equal length. Minimum 2 sequences required.

RNAalifold requires sequences that are already aligned. To create alignments from unaligned sequences, use Clustal Omega, MAFFT, or MUSCLE5 first, then pass the output to RNAalifold.

Settings

Prediction options

SettingDescription
TemperatureFolding temperature in Celsius (0-100, default 37). Higher temperatures destabilize base pairs.
Disallow lonely pairsPrevents isolated base pairs (helices of length 1). Often improves prediction accuracy for real structures.

Energy parameters

SettingDescription
Dangling endsTreatment of unpaired nucleotides adjacent to helices. Double dangles (default) includes contributions from both sides, Ignore dangling ends omits them entirely, Allow coaxial stacking enables stacking across multi-loops.

Output

ColumnDescription
Num SequencesNumber of sequences in the input alignment
Alignment LengthLength of the alignment in nucleotides (including gaps)
Consensus SequenceComputed consensus sequence using IUPAC ambiguity codes
Consensus StructurePredicted secondary structure in dot-bracket notation
MFEMinimum free energy of the consensus structure in kcal/mol

Interpreting results

The consensus structure uses standard dot-bracket notation: dots represent unpaired nucleotides, matching parentheses indicate base pairs. More negative MFE values indicate more stable structures.

The consensus sequence reflects the most common nucleotide at each position. When multiple nucleotides appear with similar frequency, IUPAC ambiguity codes are used (e.g., R for purine, Y for pyrimidine).

When to use RNAalifold

RNAalifold excels when homologous RNA sequences are available from multiple species. The covariation signal provides independent evidence for base pairing that pure thermodynamic folding cannot capture.

Ideal use cases include:

  • Predicting structures of conserved non-coding RNAs (riboswitches, ribozymes, snoRNAs)
  • Validating predicted structures against evolutionary evidence
  • Identifying functionally constrained regions in RNA alignments
  • Comparative analysis of viral RNA genomes

For single sequences without homologs, RNAfold provides standard MFE prediction. For very similar sequences with minimal covariation, the benefit over single-sequence folding is limited.

Limitations

Alignment quality matters: RNAalifold assumes the input alignment is correct. Misaligned sequences will produce spurious covariation signals and incorrect structures. Curated alignments from databases like Rfam typically perform better than automated alignments.

Sequence diversity: Very similar sequences provide little covariation information; very divergent sequences may be difficult to align accurately. Moderate diversity (60-90% identity) often yields the best predictions.

No sequence weighting: The current implementation treats all sequences equally. If the alignment contains many nearly identical sequences alongside diverse ones, the similar sequences will dominate the averaged energy calculation.

Pseudoknots excluded: Like other ViennaRNA tools, RNAalifold predicts only nested secondary structures. Base pairs whose partners interleave with another pair are not modeled.

  • RNAfold: Single-sequence MFE structure prediction
  • Clustal Omega: Create multiple sequence alignments for input to RNAalifold
  • MAFFT: Alternative alignment tool with multiple accuracy/speed modes
  • MUSCLE5: High-accuracy alignment using the PPP algorithm
  • ViennaRNA: Unified interface to all ViennaRNA analysis methods