
GenBank to FASTA Converter
Convert GenBank files to FASTA format. Upload a GenBank file or paste the GenBank data below.
GenBank is the richly annotated sequence format maintained by NCBI as part of the International Nucleotide Sequence Database Collaboration (INSDC). A single GenBank record packages the raw nucleotide sequence together with metadata: organism, accession number, literature references, and a feature table that maps genes, coding sequences (CDS), regulatory elements, and other biologically meaningful regions onto the sequence coordinates. This wealth of annotation makes GenBank the standard exchange format for deposited sequences, but it also makes the files difficult to feed into tools that expect plain sequences.
FASTA, by contrast, stores only a header line and the sequence itself. Most alignment, search, and analysis tools accept FASTA as their primary input. The GenBank to FASTA Converter extracts sequences from GenBank records and outputs them in FASTA format, preserving selected metadata in the header line.
Beyond extracting the primary nucleotide sequence, the converter can pull out individual coding sequences or their translated protein products directly from the feature table. This eliminates the need to manually locate CDS coordinates and splice together exonic regions before translation.
A GenBank flat file is divided into an annotation section and a sequence section. The annotation section begins with the LOCUS line and includes the DEFINITION (a brief description of the sequence), ACCESSION (the unique identifier), and FEATURES (the biological annotation table). The sequence section begins after the ORIGIN keyword and contains the nucleotide letters in numbered rows, ending with a // terminator.
1LOCUS AB000263 5368 bp mRNA PRI 05-FEB-19992DEFINITION Homo sapiens mRNA for semaphorin III, complete cds.3ACCESSION AB0002634...5FEATURES Location/Qualifiers6 source 1..53687 /organism="Homo sapiens"8 CDS 187..32159 /gene="SemaIII"10 /translation="MWQIVFFTLSCDLVLAAAYNNF..."11...12ORIGIN13 1 agatggcgga gctgacgggg tctcagaatg ...14//The converter parses each record in the file and, depending on the selected extraction mode, performs one of three operations:
//, strips numbering and whitespace, and outputs the full sequence./translation qualifier attached to each CDS feature, which contains the amino acid sequence already translated by the submitter using the correct genetic code and reading frame.The FASTA header is assembled from metadata fields such as the accession number, locus name, and DEFINITION line, depending on the chosen header format.
ProteinIQ provides this converter directly in the browser with no installation or account required. Paste GenBank-formatted text into the input area or upload a file with a .gb, .gbk, or .genbank extension. All processing runs client-side, so sequence data never leaves the browser.
| Input | Accepted formats | Max file size |
|---|---|---|
Input | .gb, .gbk, .genbank, .txt | 50 MB |
GenBank files containing multiple records (separated by //) are processed in batch. Each record produces one or more FASTA entries depending on the extraction mode.
| Setting | Options | Default | Description |
|---|---|---|---|
Extract sequence type | Primary sequence, Coding sequences (CDS), Translated proteins | Primary sequence | Determines which sequences are extracted from each GenBank record |
Header information | Accession only, Accession and definition, Locus and definition, Full information | Accession and definition | Controls what metadata appears in the FASTA header line |
Include locus information |
The output is a standard FASTA file with one entry per extracted sequence. Each entry starts with a > header line followed by the sequence. The result can be copied to the clipboard or downloaded as a file.
1>AB000263 Homo sapiens mRNA for semaphorin III, complete cds.2AGATGGCGGAGCTGACGGGGTCTCAGAATGATTTTCTGAAGGACCATTTC...When Coding sequences (CDS) is selected, each CDS in the record becomes a separate FASTA entry. When Translated proteins is selected, the output contains amino acid sequences instead of nucleotides.
The converter relies on the annotation present in the input file. If a GenBank record lacks CDS features, the Coding sequences (CDS) and Translated proteins extraction modes will produce no output for that record. Similarly, CDS features without a /translation qualifier will be skipped in protein extraction mode.
GenBank files with non-standard formatting or records from older database versions may not parse correctly. Files should conform to the NCBI GenBank flat file specification.
| On / Off |
| Off |
| Appends the locus name to the FASTA header |