ProteinIQ

FASTQ to FASTA converter

Convert FASTQ sequencing files to FASTA format. Upload a FASTQ file or paste the sequencing data below.

What is FASTQ to FASTA conversion?

FASTA and FASTQ are the two most common formats for storing biological sequences. FASTA contains only sequence identifiers and nucleotide sequences, while FASTQ additionally stores quality scores for each base—a record of the sequencing confidence at every position.

Converting from FASTQ to FASTA means stripping away the quality information, leaving only the sequence identifiers and sequences themselves. This is necessary when downstream tools don't require or accept quality scores, or when you need to reduce file size for storage or transmission.

We recommend this tool when preparing sequences for tools that require FASTA format (like BLAST or phylogenetic alignment), sharing data where quality scores are unnecessary, or when archiving sequences long-term and file size matters. If you need the opposite conversion—adding synthetic quality scores to FASTA files—use our FASTA to FASTQ converter instead.

The FASTQ format structure

A FASTQ file contains four lines per sequence record:

1@SEQ_ID description2GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT3+4!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
  • Line 1: Header starting with @, containing the sequence identifier and optional description
  • Line 2: The nucleotide sequence (A, T, G, C, N for DNA; A, U, G, C, N for RNA)
  • Line 3: A separator line starting with +, optionally repeating the identifier
  • Line 4: Quality scores, one ASCII character per base (must match sequence length exactly)

The quality line encodes confidence in each base call using the Phred scale.

Understanding Phred quality scores

Quality scores in FASTQ use the Phred scale, a logarithmic representation of sequencing error probability:

Q=10log10(P)Q = -10 \log_{10}(P)

Where QQ is the Phred quality score and PP is the probability the base call is incorrect. Rearranged to calculate error probability from quality:

P=10Q/10P = 10^{-Q/10}

Quality score interpretation

Phred ScoreError ProbabilityBase Call AccuracyTypical Sequencing Platform
Q101 in 10 (10%)90%Low-quality reads or problematic bases
Q201 in 100 (1%)99%Standard quality threshold for most work
Q301 in 1,000 (0.1%)99.9%High-quality Illumina sequencing
Q401 in 10,000 (0.01%)99.99%Excellent sequencing data

Modern Illumina sequencers typically produce reads with average quality scores between Q30-Q40 at the beginning of reads. Oxford Nanopore and PacBio long-read sequencers often generate lower average scores (Q10-Q20) due to their different technology.

ASCII encoding (Phred+33)

FASTQ files encode quality scores as single ASCII characters for compact storage. The standard encoding (Phred+33, used by Illumina 1.8+, Sanger, and modern sequencers) adds 33 to the Phred score and converts to the corresponding ASCII character:

  • Q0 → ASCII 33 → !
  • Q10 → ASCII 43 → +
  • Q20 → ASCII 53 → 5
  • Q30 → ASCII 63 → ?
  • Q40 → ASCII 73 → I

This tool uses the standard Phred+33 encoding, which is supported by all major bioinformatics software.

Quality filtering options

When converting FASTQ to FASTA, you can optionally filter sequences based on their average quality score:

No filtering

Converts all sequences, preserving them exactly as they appear in the FASTQ file. Use this when you want to keep all data regardless of quality.

Q10+ (90% accuracy)

Keeps only sequences with an average quality score of 10 or higher. This removes the very lowest quality reads but allows marginal data through. Rarely used except when data is extremely limited.

Q20+ (99% accuracy)

Filters sequences to keep only those with average quality of 20 or higher. This is the standard quality threshold for most bioinformatics work and removes obviously problematic data while retaining usable reads.

Q30+ (99.9% accuracy)

Keeps only sequences with average quality of 30 or higher. This is a stringent threshold suitable for applications where accuracy is critical, such as variant calling or quality-sensitive downstream analysis. May remove a significant portion of your data.

Decision guidance

  • Use No filtering if your downstream tool doesn't require quality filtering, or if you're preserving the original data
  • Use Q20+ for most standard workflows and general analysis
  • Use Q30+ for strict quality standards (variant detection, medical diagnostics, mutation discovery)
  • Use Q10+ only when working with challenging samples where data is limited

Include quality info in headers

Enabling this optional setting adds the sequence's average quality score to the FASTA header line. This preserves quality metadata even after converting away the quality scores themselves:

1>SEQ_ID description | avg_qual=28.52GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT

This is useful when you need FASTA format but want to track which sequences had high versus low quality, or for documenting the original data quality in your analysis records.

When to use FASTQ to FASTA converter

Use this tool for:

  • Format compatibility: Preparing sequences for tools that only accept FASTA input (BLAST, sequence aligners, phylogenetic inference programs)
  • File size reduction: FASTA files are 4-5 times smaller than FASTQ files, useful for long-term archival or data transfer
  • Pipeline compatibility: Converting between different bioinformatics pipelines where some stages require FASTA and others use FASTQ
  • Publication archival: When documenting sequences in papers, FASTA is the standard format for supplementary data
  • Quality control complete: After you've already performed quality filtering and only need the passing sequences

FAQ

Why would I remove quality scores if I have them?

Many bioinformatics tools don't use or even accept quality scores. Phylogenetic analysis, BLAST searches, and many sequence alignment methods only look at the sequence itself. Removing quality scores also reduces file size—FASTA files are typically one-fourth to one-fifth the size of FASTQ files.

Which quality threshold should I choose?

For most analyses, use Q20+ as the standard. This removes only obviously problematic bases while retaining the bulk of your data. Choose Q30+ only if accuracy is critical (variant calling, medical diagnostics). Use No filtering only if you have a specific reason to keep low-quality sequences.

Does this tool support both DNA and RNA sequences?

Yes. The tool accepts both DNA (A, T, G, C) and RNA (A, U, G, C) sequences. Sequences containing N bases are converted as-is and pass through all quality filters.

Can I convert multiple sequences at once?

Yes. Upload a multi-sequence FASTQ file or paste multiple FASTQ records (as long as each record is properly formatted with 4 lines), and they will all be converted to FASTA format simultaneously. Each sequence is processed independently.

What happens to sequences that don't pass the quality filter?

Filtered sequences are removed from the output entirely. If you enable the quality info in headers option, you can re-run the conversion with No filtering afterward to see which sequences would have been removed and their quality scores. Alternatively, your FASTQ file still contains all original records if you need to recover filtered sequences later.

  • FASTA to FASTQ — Add synthetic quality scores to FASTA sequences, useful for testing pipelines or converting reference sequences
  • FASTA splitter — Split multi-sequence FASTA files into individual sequences for downstream processing
  • PDB to FASTA — Extract protein or DNA sequences directly from 3D protein structures in PDB format
  • Reverse complement — Reverse complement DNA sequences (useful after FASTQ to FASTA conversion if you need strand conversion)

References