FASTQ to FASTA converter
Convert your FASTQ sequence files to FASTA format. Upload a file or paste your FASTQ data below.
Click to upload or drag and drop a
FASTQ file (.fastq, .fq, .txt)
What is FASTQ?
FASTQ format is a text-based format for storing nucleotide sequences along with their corresponding quality scores. Each sequence entry consists of four lines:
- Line 1: a sequence identifier starting with
@
and optional description - Line 2: the raw sequence letters representing nucleotides
- Line 3: a separator line starting with
+
and optional the same sequence identifier and description - Line 4: quality scores encoded as ASCII characters.
FASTQ files are commonly generated by high-throughput sequencing platforms and contain both the biological sequence data and per-base quality information essential for downstream analysis.
Here's an example FASTQ file structure with a single sequence read:
@SRR123456.1 HWI-ST1234:100:C1234ACXX:1:1101:1000:2000 1:N:0:ATCACG
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
What is FASTA?
FASTA format is a simpler text-based format that stores nucleotide or protein sequences without quality information. Each sequence entry consists of a header line starting with >
followed by the sequence identifier and description, and subsequent lines containing the actual sequence data.
FASTA is widely used for reference genomes, gene databases, and applications where quality scores are not required.
The same sequence converted to FASTA format:
>SRR123456.1 HWI-ST1234:100:C1234ACXX:1:1101:1000:2000 1:N:0:ATCACG
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
Programming examples
Python implementation
Basic conversion using Python:
def fastq_to_fasta(input_fastq, output_fasta):
"""Convert FASTQ file to FASTA format."""
with open(input_fastq, 'r') as infile, open(output_fasta, 'w') as outfile:
line_count = 0
for line in infile:
line = line.strip()
if line_count % 4 == 0: # Header line
# Replace @ with > for FASTA format
fasta_header = '>' + line[1:]
outfile.write(fasta_header + '\n')
elif line_count % 4 == 1: # Sequence line
outfile.write(line + '\n')
# Skip quality header (line_count % 4 == 2) and quality scores (line_count % 4 == 3)
line_count += 1
# Usage
fastq_to_fasta('input.fastq', 'output.fasta')
Using BioPython
For more robust parsing with BioPython:
from Bio import SeqIO
def convert_fastq_to_fasta(input_file, output_file):
"""Convert FASTQ to FASTA using BioPython."""
sequences = SeqIO.parse(input_file, "fastq")
count = SeqIO.write(sequences, output_file, "fasta")
return count
# Convert with quality filtering
def convert_with_quality_filter(input_file, output_file, min_quality=20):
"""Convert FASTQ to FASTA with quality filtering."""
with open(output_file, 'w') as output_handle:
for record in SeqIO.parse(input_file, "fastq"):
# Calculate average quality score
avg_quality = sum(record.letter_annotations["phred_quality"]) / len(record)
if avg_quality >= min_quality:
SeqIO.write(record, output_handle, "fasta")
# Usage
convert_fastq_to_fasta('input.fastq', 'output.fasta')
convert_with_quality_filter('input.fastq', 'filtered_output.fasta', min_quality=25)
Command-line tools
Using standard Unix tools:
# Using seqtk (fast and memory efficient)
seqtk seq -a input.fastq > output.fasta
# Using awk (simple one-liner)
awk 'NR%4==1{print ">" substr($0,2)} NR%4==2{print}' input.fastq > output.fasta
# Using sed (stream editor approach)
sed -n '1~4s/^@/>/p;2~4p' input.fastq > output.fasta
Common use cases
FASTQ to FASTA conversion is essential for many bioinformatics workflows:
- Preparing reference sequences for alignment tools that require FASTA input
- Creating databases for sequence similarity searches (BLAST, etc.)
- Converting sequencing data for tools that don't handle quality scores
- Simplifying data format for downstream analysis pipelines
- Reducing file size when quality information is not needed