
GenBank to FASTA Converter
Convert GenBank records to FASTA by extracting primary sequences, CDS, or translations.
Related tools

CSV to FASTA
Convert CSV and TSV files containing sequence data to FASTA format with flexible column mapping and automatic delimiter detection

FASTA to FASTQ Converter
Convert FASTA sequence files to FASTQ format with mock quality scores

FASTQ to FASTA converter
Convert FASTQ sequence files to FASTA format

TXT to FASTA converter
Convert plain text sequences to FASTA format - supports DNA, RNA, and protein sequences with automatic cleanup and validation

DNA to RNA converter
Convert DNA sequences to RNA (transcription) - replaces T with U

GenBank Feature Extractor
Extract sequence features (CDS, mRNA, gene, etc.) from GenBank files in FASTA format with support for spliced features

RNA to DNA converter
Convert RNA sequences to DNA (reverse transcription) - replaces U with T

Reverse complement generator
Generate reverse, complement, or reverse-complement of DNA/RNA sequences

DNA to Protein Converter
Translate DNA sequences to protein sequences using genetic code

Protein to DNA converter
Reverse translate protein sequences to possible DNA sequences
What is GenBank to FASTA converter?
GenBank is the richly annotated sequence format maintained by NCBI as part of the International Nucleotide Sequence Database Collaboration (INSDC). A single GenBank record packages the raw nucleotide sequence together with metadata: organism, accession number, literature references, and a feature table that maps genes, coding sequences (CDS), regulatory elements, and other biologically meaningful regions onto the sequence coordinates. This wealth of annotation makes GenBank the standard exchange format for deposited sequences, but it also makes the files difficult to feed into tools that expect plain sequences.
FASTA, by contrast, stores only a header line and the sequence itself. Most alignment, search, and analysis tools accept FASTA as their primary input. The GenBank to FASTA Converter extracts sequences from GenBank records and outputs them in FASTA format, preserving selected metadata in the header line.
Beyond extracting the primary nucleotide sequence, the converter can pull out individual coding sequences or their translated protein products directly from the feature table. This eliminates the need to manually locate CDS coordinates and splice together exonic regions before translation.
How does GenBank to FASTA conversion work?
A GenBank flat file is divided into an annotation section and a sequence section. The annotation section begins with the LOCUS line and includes the DEFINITION (a brief description of the sequence), ACCESSION (the unique identifier), and FEATURES (the biological annotation table). The sequence section begins after the ORIGIN keyword and contains the nucleotide letters in numbered rows, ending with a // terminator.
1LOCUS AB000263 5368 bp mRNA PRI 05-FEB-19992DEFINITION Homo sapiens mRNA for semaphorin III, complete cds.3ACCESSION AB0002634...5FEATURES Location/Qualifiers6 source 1..53687 /organism="Homo sapiens"8 CDS 187..32159 /gene="SemaIII"10 /translation="MWQIVFFTLSCDLVLAAAYNNF..."11...12ORIGIN13 1 agatggcgga gctgacgggg tctcagaatg ...14//The converter parses each record in the file and, depending on the selected extraction mode, performs one of three operations:
- Primary sequence: Reads the nucleotide letters between ORIGIN and
//, strips numbering and whitespace, and outputs the full sequence. - Coding sequences (CDS): Scans the FEATURES table for CDS entries, extracts their location coordinates (handling joins and complements), and slices the corresponding subsequences from the primary sequence.
- Translated proteins: Reads the
/translationqualifier attached to each CDS feature, which contains the amino acid sequence already translated by the submitter using the correct genetic code and reading frame.
The FASTA header is assembled from metadata fields such as the accession number, locus name, and DEFINITION line, depending on the chosen header format.
How to use GenBank to FASTA converter online
ProteinIQ provides this converter directly in the browser with no installation or account required. Paste GenBank-formatted text into the input area or upload a file with a .gb, .gbk, or .genbank extension. All processing runs client-side, so sequence data never leaves the browser.
Input
| Input | Accepted formats | Max file size |
|---|---|---|
Input | .gb, .gbk, .genbank, .txt | 50 MB |
GenBank files containing multiple records (separated by //) are processed in batch. Each record produces one or more FASTA entries depending on the extraction mode.
Settings
| Setting | Options | Default | Description |
|---|---|---|---|
Extract sequence type | Primary sequence, Coding sequences (CDS), Translated proteins | Primary sequence | Determines which sequences are extracted from each GenBank record |
Header information | Accession only, Accession and definition, Locus and definition, Full information | Accession and definition | Controls what metadata appears in the FASTA header line |
Include locus information | On / Off | Off | Appends the locus name to the FASTA header |
Output
The output is a standard FASTA file with one entry per extracted sequence. Each entry starts with a > header line followed by the sequence. The result can be copied to the clipboard or downloaded as a file.
1>AB000263 Homo sapiens mRNA for semaphorin III, complete cds.2AGATGGCGGAGCTGACGGGGTCTCAGAATGATTTTCTGAAGGACCATTTC...When Coding sequences (CDS) is selected, each CDS in the record becomes a separate FASTA entry. When Translated proteins is selected, the output contains amino acid sequences instead of nucleotides.
Applications
- Pipeline preparation: Many sequence analysis workflows begin with FASTA input. Converting GenBank records downloaded from NCBI into FASTA makes them compatible with BLAST, multiple sequence alignment tools, and phylogenetic software.
- CDS extraction: Isolating all coding sequences from an annotated genome or plasmid record without manually reading feature coordinates.
- Protein extraction: Obtaining translated protein sequences from GenBank CDS annotations, avoiding potential errors from manual translation or incorrect reading frame selection.
- Batch processing: Converting multi-record GenBank files (such as those from NCBI Batch Entrez downloads) into a single multi-FASTA file ready for downstream analysis.
Limitations
The converter relies on the annotation present in the input file. If a GenBank record lacks CDS features, the Coding sequences (CDS) and Translated proteins extraction modes will produce no output for that record. Similarly, CDS features without a /translation qualifier will be skipped in protein extraction mode.
GenBank files with non-standard formatting or records from older database versions may not parse correctly. Files should conform to the NCBI GenBank flat file specification.