TXT to FASTA converter

Convert TXT, plain text, or copied sequence blocks into downloadable FASTA files for DNA, RNA, and protein workflows.

How to convert TXT to FASTA?

The easiest way to convert TXT to FASTA format online is to use this ProteinIQ TXT to FASTA converter. To convert TXT to FASTA, paste a raw sequence or upload a .txt file, choose the sequence type if you know it, and run the converter. ProteinIQ removes copied line numbers and spacing, adds or preserves FASTA headers, wraps sequence lines, validates sequence characters, and returns a .fasta file.

For one sequence, the result contains one FASTA record:

Text
>Sequence_1
ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG

While for multi-sequence FASTA files, each sequence gets a new line:

Text
>Sequence_1
ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG
>Sequence_2
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF

For multiple sequences, separate entries with blank lines, existing FASTA headers, or the selected custom separator. The output keeps each sequence as a separate FASTA record so it can be used in alignment tools, sequence search tools, or downstream bioinformatics pipelines.

InputDescription
InputPlain text containing one or more sequences. Accepts pasted text or file uploads. Common extensions: .txt, .fasta, .fa, .fas, .seq, .dat. Maximum file size: 50 MB.
SettingDescription
Multi-sequencesMethod for identifying separate sequences. Auto-detect sequences (default) analyzes text structure, keeps likely wrapped sequence lines together, and avoids treating prose labels as sequence records. Split on empty lines treats each block separated by blank lines as a distinct sequence. Custom separator uses a specified delimiter string.
Custom separatorDelimiter string for separating sequences when Custom separator mode is selected. Default: ---. An empty custom separator is rejected instead of falling back to auto-detection.
Sequence typeSelect Auto-detect (default), DNA, RNA, or Protein. Auto-detect avoids deleting valid protein residues before classification. Explicit DNA mode rejects U, explicit RNA mode rejects T, and Protein mode preserves valid amino acid residues.
Header formatControls how sequence identifiers are generated. Preserve existing headers (default) maintains existing FASTA header lines. seq_1, seq_2, ... or sequence_1, sequence_2, ... provide simple incrementing names. Custom prefix allows defining a custom naming scheme. Extract from text (smart) attempts to identify meaningful names from surrounding text.
Custom prefixPrefix string for sequence headers when Custom prefix mode is selected. Default: seq. Prefix text is sanitized so spaces, punctuation, or pasted line breaks cannot create malformed FASTA headers.
Header extraction patternRefines smart extraction behavior when using Extract from text (smart) mode. First word of each sequence block takes the initial word before each sequence. Line numbers searches for patterns like "1.", "2.". Sequence identifiers looks for conventions like "seq1" or "protein_a".
Line wrappingNumber of characters per line in the output. 80 characters per line (standard) (default) follows NCBI recommendations. 60 characters per line is common in many workflows. No wrapping (single line) outputs each sequence on a single line.
Case formatLetter case for output sequences. UPPERCASE (default) matches database expectations. lowercase for alternative formatting. Preserve original maintains input capitalization.
Character cleanupMaster switch enabling automatic removal of non-sequence characters. Default: enabled.
Remove spacesStrips whitespace characters from sequences. Default: enabled.
Remove numbersStrips numeric characters (0-9) from sequences, useful for sequences copied from numbered formats. Default: enabled.
Remove tabsStrips tab characters from sequences. Default: enabled.
Remove punctuationStrips punctuation marks from sequences. Default: enabled.
Preserve alignment gapsKeeps - and . gap characters when converting aligned FASTA or MSA-style input. Default: disabled.
Preserve stop codonsKeeps terminal * stop markers in protein sequences. Internal * characters are still rejected during validation. Default: disabled.
[Remove invalid characters](/app/filter-dna)Strips characters that are not valid biological residues. Type-specific residue removal only runs when a Sequence type is selected, so in auto-detect mode short proteins made mostly of nucleotide-overlap letters are not truncated. Default: enabled.
Validate sequencesPerforms a final check that all output characters are valid biological sequence codes for the selected sequence type. Default: enabled.
Validation strictnessLenient (auto-clean) removes invalid characters and reports what changed. Strict (reject invalid) fails any sequence that would require cleanup, including spaces, tabs, numbers, punctuation, non-letter characters, or sequence-type mismatches.
Add line numbers to headersIncludes original line numbers from the input file in FASTA headers, useful for tracking sequence sources. Default: disabled.
Show sequence statisticsDisplays statistics including sequence count, total length, average length, and detected sequence type. Default: enabled.

Results

The converter produces FASTA-formatted output that can be copied to clipboard or downloaded as a .fasta file.

OutputDescription
FASTA textProperly formatted sequences with > headers and wrapped sequence lines. Each sequence appears on separate lines following its header.
StatisticsWhen enabled, displays sequence count, total length, average length, shortest and longest length, and detected sequence type (protein, DNA, or RNA).

How to make a FASTA file from a TXT file

A plain text file can be converted to FASTA by adding a header line that starts with > and placing the sequence on the next line. This is the standard way to create a FASTA file from raw sequence text before saving it with a .fasta or .fa extension.

Edit the file in a text editor

For small files, open the .txt file in a plain text editor such as Notepad or TextEdit and format each sequence like this:

Text
>Sequence_1
MTEITAAMVKELRESTGAGMMDCKNALSETQHEWAYK

If your file contains multiple sequences, repeat the same pattern for each entry:

Text
>Sequence_1
MTEITAAMVKELRESTGAGMMDCKNALSETQHEWAYK
>Sequence_2
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF

How to save a FASTA file

After formatting the header and sequence lines, use Save As in a text editor and save the file as plain text with a .fasta or .fa extension. If the editor appends .txt, choose plain text output explicitly and rename the file so the final filename ends in .fasta or .fa.

Use a converter when the input is messy

If the text contains numbering, spaces, or inconsistent formatting, use a dedicated converter to clean the sequences and generate headers automatically. ProteinIQ supports pasted text and uploaded files, so it is useful when manual editing would be slow or error-prone.

FASTA format rules

FASTA files are simple, but a few rules matter for downstream tool compatibility.

  • Header line: Each sequence starts with a single header line beginning with >. The identifier should be unique within the file.
  • Sequence line: Put the sequence directly below the header. Many tools accept wrapped lines, but one continuous line per sequence is often easier to inspect.
  • Valid characters: Use standard nucleotide codes such as A, C, G, T, U, and N, or standard one-letter amino acid codes for proteins.
  • Alignment gaps: Gapped alignment FASTA files may contain - or . gap characters. Enable Preserve alignment gaps when converting aligned sequences.
  • Protein stops: Translated protein FASTA records sometimes end with * to mark a stop codon. Enable Preserve stop codons if you need to keep that terminal marker.
  • No numbering or spaces: Remove residue numbers, tabs, spaces, and other non-sequence characters unless a tool explicitly allows them.
  • Plain text file: Save the file as plain text before renaming it to .fasta or .fa.

Which FASTA converter should I use?

Use TXT to FASTA when your input is a plain text sequence, copied sequence block, or .txt file. Use a more specific converter when the source file has a structured format with metadata or quality scores.

Input formatBest converterUse when
TXT or pasted sequence textTXT to FASTAYou need to create a FASTA file from raw DNA, RNA, or protein text.
CSV or spreadsheet-like textCSV to FASTASequence IDs and sequences are stored in columns.
FASTQ sequencing readsFASTQ to FASTAYou need to remove quality-score lines from sequencing-read data.
GenBank recordsGenBank to FASTAYou want sequence records from annotated GenBank files.
GenBank feature tablesGenBank Feature ExtractorYou need coding sequences, genes, or other annotated features before FASTA conversion.
PDB structure filesPDB to FASTAYou need the amino acid sequence from a protein structure.

FAQ

How do I convert a text file to FASTA format?

Upload the .txt file or paste its contents, choose DNA, RNA, Protein, or Auto-detect, and run the converter. The result can be copied or downloaded as a .fasta file with headers and wrapped sequence lines.

How do I save a TXT file as FASTA?

Add a header line that starts with > above each sequence, put the sequence on the next line, and save the file as plain text with a .fasta or .fa extension. If your editor appends .txt, rename the file so it ends in .fasta or .fa. This converter can generate the headers and the downloaded file for you.

What characters are valid in DNA, RNA, and protein FASTA?

DNA uses A, C, G, T, and ambiguity codes such as N, R, and Y. RNA uses U in place of T. Protein uses one-letter amino acid codes such as A, C, D, E, and M, plus ambiguity and extended codes such as X, B, Z, J, O, and U.

Can FASTA contain gaps?

Yes. Aligned FASTA files often contain - or . gap characters to keep homologous positions in the same columns, while raw unaligned FASTA usually should not. Enable Preserve alignment gaps when your input is already an alignment.

Should I use TXT to FASTA or FASTQ to FASTA?

Use TXT to FASTA for raw sequences copied from notes, spreadsheets, or plain text files. Use FASTQ to FASTA when your input is sequencing-read data, since FASTQ includes quality-score lines that need a dedicated converter to strip.