TXT to FASTA converter

Convert plain text DNA, RNA, or protein sequences to FASTA with cleanup, validation, and generated headers.

0/1 sequences

What is TXT to FASTA converter?

TXT to FASTA converter transforms plain text sequence data into properly formatted FASTA files, the standard format for representing nucleotide or protein sequences in bioinformatics. It handles raw sequences, copied sequence blocks with residue numbers, line-wrapped records, and text that already contains FASTA headers.

FASTA format was invented by David Lipman and William Pearson in 1985 for their FASTP protein sequence similarity search program. The format begins each sequence with a header line starting with >, followed by the sequence data across one or more lines. FASTA has become a near-universal standard in bioinformatics due to its simplicity and flexibility compared to earlier fixed-field formats.

Converting sequences to FASTA format ensures compatibility with downstream analysis tools, sequence databases, and bioinformatics pipelines. The converter automatically detects multiple sequences in a single text file and applies consistent formatting rules across all entries. ProteinIQ offers several other FASTA converters for different source formats, including CSV to FASTA, GenBank to FASTA, FASTQ to FASTA, and PDB to FASTA.

How to use TXT to FASTA converter online

Paste plain text sequences or upload a text file to convert sequence blocks into FASTA online. The converter cleans copied numbering and spacing, generates or preserves headers, validates DNA, RNA, or protein characters, and returns downloadable FASTA output plus sequence statistics.

Inputs

InputDescription
InputPlain text containing one or more sequences. Accepts pasted text or file uploads. Supported file extensions: .txt, .fasta, .fa, .fas, .seq, .dat. Maximum file size: 50 MB.

Settings

Sequence detection

SettingDescription
Multi-sequencesMethod for identifying separate sequences. Auto-detect sequences (default) analyzes text structure, keeps likely wrapped sequence lines together, and avoids treating prose labels as sequence records. Split on empty lines treats each block separated by blank lines as a distinct sequence. Custom separator uses a specified delimiter string.
Custom separatorDelimiter string for separating sequences when Custom separator mode is selected. Default: ---. An empty custom separator is rejected instead of falling back to auto-detection.
Sequence typeSelect Auto-detect (default), DNA, RNA, or Protein. Auto-detect avoids deleting valid protein residues before classification. Explicit DNA mode rejects U, explicit RNA mode rejects T, and Protein mode preserves valid amino acid residues.

Header formatting

SettingDescription
Header formatControls how sequence identifiers are generated. Preserve existing headers (default) maintains existing FASTA header lines. seq_1, seq_2, ... or sequence_1, sequence_2, ... provide simple incrementing names. Custom prefix allows defining a custom naming scheme. Extract from text (smart) attempts to identify meaningful names from surrounding text.
Custom prefixPrefix string for sequence headers when Custom prefix mode is selected. Default: seq. Prefix text is sanitized so spaces, punctuation, or pasted line breaks cannot create malformed FASTA headers.
Header extraction patternRefines smart extraction behavior when using Extract from text (smart) mode. First word of each sequence block takes the initial word before each sequence. Line numbers searches for patterns like "1.", "2.". Sequence identifiers looks for conventions like "seq1" or "protein_a".

Sequence formatting

SettingDescription
Line wrappingNumber of characters per line in the output. 80 characters per line (standard) (default) follows NCBI recommendations. 60 characters per line is common in many workflows. No wrapping (single line) outputs each sequence on a single line.
Case formatLetter case for output sequences. UPPERCASE (default) matches database expectations. lowercase for alternative formatting. Preserve original maintains input capitalization.

Character cleanup

SettingDescription
Character cleanupMaster switch enabling automatic removal of non-sequence characters. Default: enabled.
Remove spacesStrips whitespace characters from sequences. Default: enabled.
Remove numbersStrips numeric characters (0-9) from sequences, useful for sequences copied from numbered formats. Default: enabled.
Remove tabsStrips tab characters from sequences. Default: enabled.
Remove punctuationStrips punctuation marks from sequences. Default: enabled.
Remove invalid charactersStrips letters that are incompatible with the selected sequence type. In auto-detect mode, type-specific residue removal is delayed until after classification so short proteins made mostly of nucleotide-overlap letters are not truncated. Default: enabled.

Validation and output options

SettingDescription
Validate sequencesPerforms a final check that all output characters are valid biological sequence codes for the selected sequence type. Default: enabled.
Validation strictnessLenient (auto-clean) removes invalid characters and reports what changed. Strict (reject invalid) fails any sequence that would require cleanup, including spaces, tabs, numbers, punctuation, non-letter characters, or sequence-type mismatches.
Add line numbers to headersIncludes original line numbers from the input file in FASTA headers, useful for tracking sequence sources. Default: disabled.
Show sequence statisticsDisplays statistics including sequence count, total length, average length, and detected sequence type. Default: enabled.

Results

The converter produces FASTA-formatted output that can be copied to clipboard or downloaded as a .fasta file.

OutputDescription
FASTA textProperly formatted sequences with > headers and wrapped sequence lines. Each sequence appears on separate lines following its header.
StatisticsWhen enabled, displays sequence count, total residues, average length, and detected sequence type (protein, DNA, or RNA).

How to make a FASTA file from a TXT file

A plain text file can be converted to FASTA by adding a header line that starts with > and placing the sequence on the next line. This is the standard way to create a FASTA file from raw sequence text before saving it with a .fasta or .fa extension.

Edit the file in a text editor

For small files, open the .txt file in a plain text editor such as Notepad or TextEdit and format each sequence like this:

text
1>Sequence_12MTEITAAMVKELRESTGAGMMDCKNALSETQHEWAYK

If your file contains multiple sequences, repeat the same pattern for each entry:

text
1>Sequence_12ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG3>Sequence_24MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF

Use the command line for simple batch conversion

If an input file contains one sequence per line and no FASTA headers, awk can add numbered headers:

Bash
1awk '{ print ">"NR"\n"$0 }' input.txt > output.fasta

This writes each input line as a separate FASTA record with headers such as >1, >2, and >3.

How to save a FASTA file

After formatting the header and sequence lines, use Save As in a text editor and save the file as plain text with a .fasta or .fa extension. If the editor appends .txt, choose plain text output explicitly and rename the file so the final filename ends in .fasta or .fa.

Use a converter when the input is messy

If the text contains numbering, spaces, or inconsistent formatting, use a dedicated converter to clean the sequences and generate headers automatically. ProteinIQ supports pasted text and uploaded files, so it is useful when manual editing would be slow or error-prone.

FASTA format rules

FASTA files are simple, but a few rules matter for downstream tool compatibility.

  • Header line: Each sequence starts with a single header line beginning with >. The identifier should be unique within the file.
  • Sequence line: Put the sequence directly below the header. Many tools accept wrapped lines, but one continuous line per sequence is often easier to inspect.
  • Valid characters: Use standard nucleotide codes such as A, C, G, T, U, and N, or standard one-letter amino acid codes for proteins.
  • No numbering or spaces: Remove residue numbers, tabs, spaces, and other non-sequence characters unless a tool explicitly allows them.
  • Plain text file: Save the file as plain text before renaming it to .fasta or .fa.

For sequencing reads rather than plain text sequences, use FASTQ to FASTA. FASTQ files include quality scores, so they need a different conversion workflow than standard .txt sequence files.