
TXT to FASTA converter
Convert plain text DNA, RNA, or protein sequences to FASTA with cleanup, validation, and generated headers.
Related tools

CSV to FASTA
Convert CSV and TSV files containing sequence data to FASTA format with flexible column mapping and automatic delimiter detection

GenBank Feature Extractor
Extract sequence features (CDS, mRNA, gene, etc.) from GenBank files in FASTA format with support for spliced features

FASTA to FASTQ Converter
Convert FASTA sequence files to FASTQ format with mock quality scores

FASTQ to FASTA converter
Convert FASTQ sequence files to FASTA format

GenBank to FASTA Converter
Convert GenBank files to FASTA format

DNA to Protein Converter
Translate DNA sequences to protein sequences using genetic code

DNA to RNA converter
Convert DNA sequences to RNA (transcription) - replaces T with U

Protein to DNA converter
Reverse translate protein sequences to possible DNA sequences

RNA to DNA converter
Convert RNA sequences to DNA (reverse transcription) - replaces U with T

PDB to FASTA converter
Convert Protein Data Bank files to FASTA sequence format
What is TXT to FASTA converter?
TXT to FASTA converter transforms plain text sequence data into properly formatted FASTA files, the standard format for representing nucleotide or protein sequences in bioinformatics. It handles raw sequences, copied sequence blocks with residue numbers, line-wrapped records, and text that already contains FASTA headers.
FASTA format was invented by David Lipman and William Pearson in 1985 for their FASTP protein sequence similarity search program. The format begins each sequence with a header line starting with >, followed by the sequence data across one or more lines. FASTA has become a near-universal standard in bioinformatics due to its simplicity and flexibility compared to earlier fixed-field formats.
Converting sequences to FASTA format ensures compatibility with downstream analysis tools, sequence databases, and bioinformatics pipelines. The converter automatically detects multiple sequences in a single text file and applies consistent formatting rules across all entries. ProteinIQ offers several other FASTA converters for different source formats, including CSV to FASTA, GenBank to FASTA, FASTQ to FASTA, and PDB to FASTA.
How to use TXT to FASTA converter online
Paste plain text sequences or upload a text file to convert sequence blocks into FASTA online. The converter cleans copied numbering and spacing, generates or preserves headers, validates DNA, RNA, or protein characters, and returns downloadable FASTA output plus sequence statistics.
Inputs
| Input | Description |
|---|---|
Input | Plain text containing one or more sequences. Accepts pasted text or file uploads. Supported file extensions: .txt, .fasta, .fa, .fas, .seq, .dat. Maximum file size: 50 MB. |
Settings
Sequence detection
| Setting | Description |
|---|---|
Multi-sequences | Method for identifying separate sequences. Auto-detect sequences (default) analyzes text structure, keeps likely wrapped sequence lines together, and avoids treating prose labels as sequence records. Split on empty lines treats each block separated by blank lines as a distinct sequence. Custom separator uses a specified delimiter string. |
Custom separator | Delimiter string for separating sequences when Custom separator mode is selected. Default: ---. An empty custom separator is rejected instead of falling back to auto-detection. |
Sequence type | Select Auto-detect (default), DNA, RNA, or Protein. Auto-detect avoids deleting valid protein residues before classification. Explicit DNA mode rejects U, explicit RNA mode rejects T, and Protein mode preserves valid amino acid residues. |
Header formatting
| Setting | Description |
|---|---|
Header format | Controls how sequence identifiers are generated. Preserve existing headers (default) maintains existing FASTA header lines. seq_1, seq_2, ... or sequence_1, sequence_2, ... provide simple incrementing names. Custom prefix allows defining a custom naming scheme. Extract from text (smart) attempts to identify meaningful names from surrounding text. |
Custom prefix | Prefix string for sequence headers when Custom prefix mode is selected. Default: seq. Prefix text is sanitized so spaces, punctuation, or pasted line breaks cannot create malformed FASTA headers. |
Header extraction pattern | Refines smart extraction behavior when using Extract from text (smart) mode. First word of each sequence block takes the initial word before each sequence. Line numbers searches for patterns like "1.", "2.". Sequence identifiers looks for conventions like "seq1" or "protein_a". |
Sequence formatting
| Setting | Description |
|---|---|
Line wrapping | Number of characters per line in the output. 80 characters per line (standard) (default) follows NCBI recommendations. 60 characters per line is common in many workflows. No wrapping (single line) outputs each sequence on a single line. |
Case format | Letter case for output sequences. UPPERCASE (default) matches database expectations. lowercase for alternative formatting. Preserve original maintains input capitalization. |
Character cleanup
| Setting | Description |
|---|---|
Character cleanup | Master switch enabling automatic removal of non-sequence characters. Default: enabled. |
Remove spaces | Strips whitespace characters from sequences. Default: enabled. |
Remove numbers | Strips numeric characters (0-9) from sequences, useful for sequences copied from numbered formats. Default: enabled. |
Remove tabs | Strips tab characters from sequences. Default: enabled. |
Remove punctuation | Strips punctuation marks from sequences. Default: enabled. |
Remove invalid characters | Strips letters that are incompatible with the selected sequence type. In auto-detect mode, type-specific residue removal is delayed until after classification so short proteins made mostly of nucleotide-overlap letters are not truncated. Default: enabled. |
Validation and output options
| Setting | Description |
|---|---|
Validate sequences | Performs a final check that all output characters are valid biological sequence codes for the selected sequence type. Default: enabled. |
Validation strictness | Lenient (auto-clean) removes invalid characters and reports what changed. Strict (reject invalid) fails any sequence that would require cleanup, including spaces, tabs, numbers, punctuation, non-letter characters, or sequence-type mismatches. |
Add line numbers to headers | Includes original line numbers from the input file in FASTA headers, useful for tracking sequence sources. Default: disabled. |
Show sequence statistics | Displays statistics including sequence count, total length, average length, and detected sequence type. Default: enabled. |
Results
The converter produces FASTA-formatted output that can be copied to clipboard or downloaded as a .fasta file.
| Output | Description |
|---|---|
| FASTA text | Properly formatted sequences with > headers and wrapped sequence lines. Each sequence appears on separate lines following its header. |
| Statistics | When enabled, displays sequence count, total residues, average length, and detected sequence type (protein, DNA, or RNA). |
How to make a FASTA file from a TXT file
A plain text file can be converted to FASTA by adding a header line that starts with > and placing the sequence on the next line. This is the standard way to create a FASTA file from raw sequence text before saving it with a .fasta or .fa extension.
Edit the file in a text editor
For small files, open the .txt file in a plain text editor such as Notepad or TextEdit and format each sequence like this:
1>Sequence_12MTEITAAMVKELRESTGAGMMDCKNALSETQHEWAYKIf your file contains multiple sequences, repeat the same pattern for each entry:
1>Sequence_12ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG3>Sequence_24MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFUse the command line for simple batch conversion
If an input file contains one sequence per line and no FASTA headers, awk can add numbered headers:
1awk '{ print ">"NR"\n"$0 }' input.txt > output.fastaThis writes each input line as a separate FASTA record with headers such as >1, >2, and >3.
How to save a FASTA file
After formatting the header and sequence lines, use Save As in a text editor and save the file as plain text with a .fasta or .fa extension. If the editor appends .txt, choose plain text output explicitly and rename the file so the final filename ends in .fasta or .fa.
Use a converter when the input is messy
If the text contains numbering, spaces, or inconsistent formatting, use a dedicated converter to clean the sequences and generate headers automatically. ProteinIQ supports pasted text and uploaded files, so it is useful when manual editing would be slow or error-prone.
FASTA format rules
FASTA files are simple, but a few rules matter for downstream tool compatibility.
- Header line: Each sequence starts with a single header line beginning with
>. The identifier should be unique within the file. - Sequence line: Put the sequence directly below the header. Many tools accept wrapped lines, but one continuous line per sequence is often easier to inspect.
- Valid characters: Use standard nucleotide codes such as
A,C,G,T,U, andN, or standard one-letter amino acid codes for proteins. - No numbering or spaces: Remove residue numbers, tabs, spaces, and other non-sequence characters unless a tool explicitly allows them.
- Plain text file: Save the file as plain text before renaming it to
.fastaor.fa.
Related conversions
For sequencing reads rather than plain text sequences, use FASTQ to FASTA. FASTQ files include quality scores, so they need a different conversion workflow than standard .txt sequence files.