TXT to FASTA converter

Convert plain text sequence files to FASTA format. Upload a text file or paste your sequences below. Use the settings on the right to tweak the desired output and format your FASTA format.

Input

TXT input

0/1 sequences

Multi-sequences

Header format

Line wrapping

Case format

Character cleanup

Validate sequences

Add line numbers to headers

Show sequence statistics

Output

Configure input settings, then click "Convert"

What is TXT to FASTA converter?

TXT to FASTA converter transforms plain text sequence data into properly formatted FASTA files, the standard format for representing nucleotide or protein sequences in bioinformatics. The tool handles various input formats including raw sequences, numbered sequences from publications, and sequences with existing headers.

FASTA format was invented by David Lipman and William Pearson in 1985 for their FASTP protein sequence similarity search program. The format begins each sequence with a header line starting with ">", followed by the sequence data across one or more lines. FASTA has become a near-universal standard in bioinformatics due to its simplicity and flexibility compared to earlier fixed-field formats.

Converting sequences to FASTA format ensures compatibility with downstream analysis tools, sequence databases, and bioinformatics pipelines. The converter automatically detects multiple sequences in a single text file and applies consistent formatting rules across all entries. ProteinIQ offers several other FASTA converters for different source formats, including CSV to FASTA, GenBank to FASTA, FASTQ to FASTA, and PDB to FASTA.

How to use TXT to FASTA converter online

ProteinIQ provides a web-based interface for converting plain text sequences to FASTA format without any software installation. Paste sequences directly or upload a text file, adjust formatting options, and receive properly formatted FASTA output.

Inputs

Input	Description
`Input`	Plain text containing one or more sequences. Accepts pasted text or file uploads. Supported file extensions: `.txt`, `.fasta`, `.fa`, `.fas`, `.seq`, `.dat`. Maximum file size: 50 MB.

Settings

Sequence detection

Setting	Description
`Multi-sequences`	Method for identifying separate sequences. `Auto-detect sequences` (default) analyzes text structure to find natural boundaries. `Split on empty lines` treats each block separated by blank lines as a distinct sequence. `Custom separator` uses a specified delimiter string.
`Custom separator`	Delimiter string for separating sequences when `Custom separator` mode is selected. Default: `---`.

Header formatting

Setting	Description
`Header format`	Controls how sequence identifiers are generated. `Preserve existing headers` (default) maintains any ">" lines already present. `seq_1, seq_2, ...` or `sequence_1, sequence_2, ...` provide simple incrementing names. `Custom prefix` allows defining a custom naming scheme. `Extract from text (smart)` attempts to identify meaningful names from surrounding text.
`Custom prefix`	Prefix string for sequence headers when `Custom prefix` mode is selected. Default: `seq`.
`Header extraction pattern`	Refines smart extraction behavior when using `Extract from text (smart)` mode. `First word of each sequence block` takes the initial word before each sequence. `Line numbers` searches for patterns like "1.", "2.". `Sequence identifiers` looks for conventions like "seq1" or "protein_a".

Sequence formatting

Setting	Description
`Line wrapping`	Number of characters per line in the output. `80 characters per line (standard)` (default) follows NCBI recommendations. `60 characters per line` is common in many workflows. `No wrapping (single line)` outputs each sequence on a single line.
`Case format`	Letter case for output sequences. `UPPERCASE` (default) matches database expectations. `lowercase` for alternative formatting. `Preserve original` maintains input capitalization.

Character cleanup

Setting	Description
`Character cleanup`	Master switch enabling automatic removal of non-sequence characters. Default: enabled.
`Remove spaces`	Strips whitespace characters from sequences. Default: enabled.
`Remove numbers`	Strips numeric characters (0-9) from sequences, useful for sequences copied from numbered formats. Default: enabled.
`Remove tabs`	Strips tab characters from sequences. Default: enabled.
`Remove punctuation`	Strips punctuation marks from sequences. Default: enabled.
`Remove invalid characters`	Strips any letters that are not valid IUPAC codes, ensuring only valid nucleotide codes (A, C, G, T, U, N) or amino acid codes remain. Default: enabled.

Validation and output options

Setting	Description
`Validate sequences`	Performs a final check that all output characters are valid biological sequence codes. Default: enabled.
`Add line numbers to headers`	Includes original line numbers from the input file in FASTA headers, useful for tracking sequence sources. Default: disabled.
`Show sequence statistics`	Displays statistics including sequence count, total length, average length, and detected sequence type. Default: enabled.

Results

The converter produces FASTA-formatted output that can be copied to clipboard or downloaded as a .fasta file.

Output	Description
FASTA text	Properly formatted sequences with ">" headers and wrapped sequence lines. Each sequence appears on separate lines following its header.
Statistics	When enabled, displays sequence count, total residues, average length, and detected sequence type (protein, DNA, or RNA).

How does TXT to FASTA converter work?

The converter processes text input through several transformation stages to produce valid FASTA output.

Sequence identification

The first stage identifies individual sequences within the input text using configurable separation methods. Auto-detection analyzes text structure to find natural boundaries such as existing ">" headers, blank lines, or consistent formatting patterns. Custom delimiters accommodate data sources with non-standard separators.

Header generation

After sequence identification, the converter generates appropriate header lines for each sequence. FASTA headers follow NCBI guidelines: they must begin with ">", contain a unique sequence identifier limited to 25 characters, and remain on a single line without hard returns.

The smart extraction mode searches for common patterns like "protein_a", numbered entries ("1.", "2."), or sequence identifiers ("seq1") to create meaningful names. When no identifiable pattern exists, sequential numbering provides fallback headers.

Character transformation

The final stage applies character-level transformations to ensure valid FASTA output. The tool removes whitespace, numbers, punctuation, and other non-sequence characters while converting remaining letters to the specified case format.

Line wrapping splits long sequences following the standard 80-character limit recommended by NCBI, though 60-character and single-line output are also available. According to NCBI specifications, sequence identifiers should contain only letters, digits, hyphens, underscores, periods, colons, asterisks, and number signs.

Sequence validation

When validation is enabled, the converter checks that all remaining characters are valid IUPAC codes. For nucleotides, valid characters include A, C, G, T, U, and N (for ambiguous bases). For amino acids, all standard single-letter codes are accepted. Ambiguous characters should use "N" rather than "?" or "-", as NCBI processing strips these characters from sequences outside alignment contexts.

CSV to FASTA — Convert tabular sequence data from spreadsheets
GenBank to FASTA — Extract sequences from GenBank flat files
FASTQ to FASTA — Convert sequencing reads by removing quality scores
PDB to FASTA — Extract amino acid sequences from protein structures
FASTA Splitter — Divide multi-sequence FASTA files into individual files
FASTA to FASTQ — Add placeholder quality scores for pipeline compatibility

TXT to FASTA converter

Input

Formatting options

Output

What is TXT to FASTA converter?

How to use TXT to FASTA converter online

Inputs

Settings

Sequence detection

Header formatting

Sequence formatting

Character cleanup

Validation and output options

Results

How does TXT to FASTA converter work?

Sequence identification

Header generation

Character transformation

Sequence validation

Related tools

Input

Formatting options

Output