Click to upload (.fastq, .fq, .fastq.gz, .fq.gz)
Salmon is an RNA-seq quantifier for estimating transcript abundance from sequencing reads. It is widely used for transcript-level expression analysis because it combines fast mapping with bias-aware statistical inference, producing normalized abundance estimates such as TPM alongside estimated fragment counts.
Salmon was designed for annotated transcriptomes rather than de novo assembly. In practice, that means the quality of the reference transcript FASTA strongly influences the quality of the quantification.
ProteinIQ runs Salmon in the browser through a cloud workflow, so transcriptome indexing and quantification can be performed without installing command-line software locally. The online form accepts an uploaded transcript FASTA reference together with either single-end or paired-end FASTQ reads, then returns the main quant.sf table and Salmon metadata files.
| Input | Description |
|---|---|
Transcriptome Reference | Transcript sequences in FASTA format. Supported extensions include .fasta, .fa, .fna, .ffn, .fas, and gzipped FASTA files. |
Read 1 FASTQ | Required read file for both single-end and paired-end runs. Supported extensions: .fastq, .fq, .fastq.gz, .fq.gz. |
Read 2 FASTQ | Second read file for paired-end libraries only. |
| Setting | Description |
|---|---|
Read layout | Chooses Single-end reads or Paired-end reads. This also controls which library-type codes are valid. |
Library type | Salmon library orientation and strandedness code. Auto-detect (recommended) lets Salmon infer the protocol from the reads; explicit settings such as ISR, ISF, SR, or SF are more reliable when the library preparation is known. |
Index k-mer size | K-mer size used while building the temporary Salmon index. 31 is the default; smaller values such as 23 can help with short transcripts at the cost of specificity. |
Salmon returns the primary abundance table as well as run metadata that records how the job was executed.
| Output | Description |
|---|---|
quant.sf | Main transcript abundance table. |
cmd_info.json | Run configuration and command metadata. |
meta_info.json | Summary statistics about the quantification run. |
The data table shown in ProteinIQ exposes the main quant.sf columns:
| Column | Description |
|---|---|
Transcript | Transcript identifier from the reference FASTA. |
Length | Full transcript length in nucleotides. |
Effective Length | Bias-adjusted usable length after accounting for fragment length and sequence effects. |
TPM | Transcripts per million, a length- and library-size-normalized abundance estimate. |
Estimated Reads | Estimated number of fragments assigned to the transcript. |
Salmon builds an index over the supplied transcriptome, identifies candidate transcript origins for each read or read pair, and then estimates abundances with a probabilistic inference procedure. The 2017 Salmon paper describes this as a dual-phase approach: an online phase learns experiment-specific parameters while processing fragments, followed by an offline optimization step that refines transcript abundance estimates.
Selective alignment adds an alignment-scoring stage on top of lightweight mapping. This reduces false assignments that can occur when reads match multiple similar transcript sequences or resemble unannotated genomic regions. In current Salmon workflows, selective alignment is often paired with decoy-aware references for improved specificity; on ProteinIQ, the exposed Validate mappings option enables the selective-alignment validation step for uploaded transcriptomes.
Bias correction is central to Salmon's design. Sequence-specific effects, fragment-level GC bias, and effective transcript length all influence how raw fragment evidence is translated into expression estimates. These corrections are why Salmon output should be interpreted as model-based abundance estimates rather than simple read counts.
TPM is useful for comparing transcript abundance within a sample because it normalizes for both transcript length and sequencing depth. A higher TPM indicates that a larger share of the sequenced RNA is attributed to that transcript, but TPM values are still relative and should not be treated as absolute molecule counts.
Estimated Reads is closer to an assigned fragment count, but it is also model-derived because ambiguously mapping reads are distributed probabilistically. For transcript families with extensive sequence overlap, the distinction between TPM and Estimated Reads is less important than the underlying identifiability of the transcripts in the reference.
Effective Length matters when short transcripts or libraries with different fragment distributions are compared. If two transcripts have similar raw support but different effective lengths, the shorter effective transcript can receive a higher normalized abundance estimate.
Bootstrap replicates |
| Number of bootstrap abundance estimates to generate. Higher values provide uncertainty estimates but increase runtime. |
CPU threads | Number of CPU threads used for indexing and quantification. |
Validate mappings | Enables selective-alignment validation, which re-scores candidate mappings to reduce spurious assignments. |
Sequence-specific bias correction | Corrects sequence-context bias introduced during library preparation and sequencing. |
GC bias correction | Corrects fragment GC-content bias, one of Salmon's core methodological features. |