
GenBank to FASTA Converter
Convert GenBank records to FASTA by extracting primary sequences, CDS, or translations.

Convert CSV and TSV files containing sequence data to FASTA format with flexible column mapping and automatic delimiter detection

Convert FASTA sequence files to FASTQ format with mock quality scores

Convert FASTQ sequence files to FASTA format

Convert TXT or plain text sequences into FASTA format files for DNA, RNA, and protein workflows with cleanup, validation, and downloads

Convert DNA sequences to RNA (transcription) - replaces T with U

Extract sequence features (CDS, mRNA, gene, etc.) from GenBank files in FASTA format with support for spliced features

Convert RNA sequences to DNA (reverse transcription) - replaces U with T

Generate reverse, complement, or reverse-complement of DNA/RNA sequences

Translate DNA sequences to protein sequences using genetic code

Reverse translate protein sequences to possible DNA sequences
GenBank is the richly annotated sequence format maintained by NCBI as part of the International Nucleotide Sequence Database Collaboration (INSDC). A single GenBank record packages the raw nucleotide sequence together with metadata: organism, accession number, literature references, and a feature table that maps genes, coding sequences (CDS), regulatory elements, and other biologically meaningful regions onto the sequence coordinates. This wealth of annotation makes GenBank the standard exchange format for deposited sequences, but it also makes the files difficult to feed into tools that expect plain sequences.
FASTA, by contrast, stores only a header line and the sequence itself. Most alignment, search, and analysis tools accept FASTA as their primary input. The GenBank to FASTA Converter extracts sequences from GenBank records and outputs them in FASTA format, preserving selected metadata in the header line.
Beyond extracting the primary nucleotide sequence, the converter can pull out individual coding sequences or their translated protein products directly from the feature table. This eliminates the need to manually locate CDS coordinates and splice together exonic regions before translation.
A GenBank flat file is divided into an annotation section and a sequence section. The annotation section begins with the LOCUS line and includes the DEFINITION (a brief description of the sequence), ACCESSION (the unique identifier), and FEATURES (the biological annotation table). The sequence section begins after the ORIGIN keyword and contains the nucleotide letters in numbered rows, ending with a // terminator.
LOCUS AB000263 5368 bp mRNA PRI 05-FEB-1999
DEFINITION Homo sapiens mRNA for semaphorin III, complete cds.
ACCESSION AB000263
...
FEATURES Location/Qualifiers
source 1..5368
/organism="Homo sapiens"
CDS 187..3215
/gene="SemaIII"
/translation="MWQIVFFTLSCDLVLAAAYNNF..."
...
ORIGIN
1 agatggcgga gctgacgggg tctcagaatg ...
//The converter parses each record in the file and, depending on the selected extraction mode, performs one of three operations:
//, strips numbering and whitespace, and outputs the full sequence./translation qualifier attached to each CDS feature, which contains the amino acid sequence already translated by the submitter using the correct genetic code and reading frame.The FASTA header is assembled from metadata fields such as the accession number, locus name, and DEFINITION line, depending on the chosen header format.
ProteinIQ provides this converter directly in the browser with no installation or account required. Paste GenBank-formatted text into the input area or upload a file with a .gb, .gbk, or .genbank extension. All processing runs client-side, so sequence data never leaves the browser.
| Input | Accepted formats | Max file size |
|---|---|---|
Input | .gb, .gbk, .genbank, .txt | 50 MB |
GenBank files containing multiple records (separated by //) are processed in batch. Each record produces one or more FASTA entries depending on the extraction mode.
| Setting | Options | Default | Description |
|---|---|---|---|
Extract sequence type | Primary sequence, Coding sequences (CDS), Translated proteins | Primary sequence | Determines which sequences are extracted from each GenBank record |
Header information | Accession only, Accession and definition, Locus and definition, Full information | Accession and definition | Controls what metadata appears in the FASTA header line |
Include locus information | On / Off | Off | Appends the locus name to the FASTA header |
The output is a standard FASTA file with one entry per extracted sequence. Each entry starts with a > header line followed by the sequence. The result can be copied to the clipboard or downloaded as a file.
>AB000263 Homo sapiens mRNA for semaphorin III, complete cds.
AGATGGCGGAGCTGACGGGGTCTCAGAATGATTTTCTGAAGGACCATTTC...When Coding sequences (CDS) is selected, each CDS in the record becomes a separate FASTA entry. When Translated proteins is selected, the output contains amino acid sequences instead of nucleotides.
The converter relies on the annotation present in the input file. If a GenBank record lacks CDS features, the Coding sequences (CDS) and Translated proteins extraction modes will produce no output for that record. Similarly, CDS features without a /translation qualifier will be skipped in protein extraction mode.
GenBank files with non-standard formatting or records from older database versions may not parse correctly. Files should conform to the NCBI GenBank flat file specification.