How to split FASTA files online
The easiest way to split a FASTA file online is to upload or paste a multi-sequence FASTA file, choose how records should be grouped, and run the splitter. ProteinIQ returns downloadable FASTA files and, by default, a summary file that lists the number of records and residues or bases in each output file.
FASTA splitting is useful when downstream tools have per-job sequence limits, when you want one sequence per structure-prediction job, or when a large sequence collection needs to be divided across parallel searches, alignments, or annotation runs.
For a small input like this:
>seq1 sample A
MKTAYIAKQRQISFVKSHFSRQ
>seq2 sample B
GATCGATCGATCGATC
>seq3 sample C
MVLSPADKTNVKAAWGKVGASplit by sequence count with Sequences per file set to 2 returns one FASTA file containing seq1 and seq2, and another FASTA file containing seq3.
| Input | Description |
|---|---|
FASTA input | Multi-sequence FASTA text or an uploaded .txt, .fasta, .fa, or .fas file. Maximum file size: 50 MB. |
| Setting | Default | Purpose |
|---|---|---|
Split mode | Split by sequence count | Choose whether to group records by count, create a target number of files, group by total residues or bases, or write one file per sequence. |
Sequences per file | 1 | Number of complete FASTA records per file when using Split by sequence count. |
Number of output files | 2 | Target number of non-empty files when using Split into number of files. |
Max residues/bases per file | 100000 | Maximum total sequence length per output file when using Split by residues/bases per file. |
Naming convention | Numbered | Use numbered names, sequential names, or names based on the first sequence header in each file. |
File prefix | fasta | Prefix added to all output filenames and the optional summary file. |
Include summary file | On | Adds a text summary with split statistics and warnings. |
Preserve original headers | On | Keeps original FASTA headers. Turn off to write Sequence_1, Sequence_2, and similar generated headers. |
How FASTA splitting works
A FASTA file stores each sequence as a record with a header line that begins with >, followed by one or more sequence lines. NCBI GenBank FASTA guidance describes the first line as the FASTA definition line and recommends a unique sequence identifier. NCBI BLAST documentation also describes FASTA as a single-line description followed by sequence data lines.
ProteinIQ reads the FASTA records, validates that every header has sequence content, and splits complete records into output files. It does not cut a single FASTA record into smaller sequence fragments. This matters for downstream tools because record identifiers, annotations, and complete biological sequences stay together.
Blank lines are ignored. Comment lines beginning with # or ; are ignored with warnings, and whitespace inside sequence lines is removed with warnings. Malformed input is rejected instead of being silently converted: the first non-empty line must be a FASTA header, each header must contain text after >, and every record must include sequence characters.
All splitting runs in your browser. Sequence data is not sent to a server for this client-side utility.
Which FASTA split mode should I use?
The right split mode depends on the downstream constraint you are trying to satisfy.
| Split mode | Use when | Output behavior |
|---|---|---|
Split by sequence count | A downstream tool accepts a fixed number of FASTA records per job. | Creates chunks with up to the selected number of complete records per file. |
Split into number of files | You want balanced batches for parallel processing. | Creates the requested number of non-empty files when enough records are available. If you request more files than sequences, each sequence gets its own file and a warning is shown. |
Split by residues/bases per file | A workflow has an approximate total-sequence-length limit per file. | Adds complete records until the next record would exceed the target. Records longer than the target are kept intact and reported with a warning. |
Individual files | Each sequence needs a separate job, such as one protein per prediction or one query per search. | Writes one FASTA file per record. |
ProteinIQ preserves sequence order in all modes. Splitting is deterministic, so the same input and settings produce the same groups and filenames.
FASTA output and filename rules
The result panel returns split FASTA files as a file list. You can download files individually or use the download action to collect all output files together.
| Output | Description |
|---|---|
| Split FASTA files | Record-preserving FASTA files generated from the selected split mode. |
| Summary file | Optional .txt report listing total sequences, total residues or bases, split mode, files created, per-file counts, per-file sequence length totals, file sizes, and warnings. |
| Warnings | Returned in the results panel and included in the summary file when comments, whitespace cleanup, requested file counts, or over-limit records need attention. |
Output file extensions match the uploaded FASTA extension when it is .fasta, .fa, .fas, .fna, or .faa. Pasted input and other uploaded extensions use .fasta.
Numbered filenames use the pattern prefix_part_001.fasta. Sequential filenames use prefix_1.fasta, prefix_2.fasta, and so on. Header-based filenames use a sanitized version of the first sequence header in each output file; duplicate names are de-duplicated with numeric suffixes.
Keep Preserve original headers enabled unless a downstream system needs simple generated identifiers. FASTA headers often carry accessions, sample IDs, organism labels, or coordinate information needed to trace results back to the source sequence.
FASTA splitter alternatives
Command-line tools are better for automated pipelines, compressed files, very large datasets, and scripted reproducibility. SeqKit includes split and split2 commands for FASTA and FASTQ files, including splitting by number of parts and records per part. The SeqKit paper describes it as a cross-platform toolkit for FASTA/Q manipulation with high performance on large sequence files.
Use ProteinIQ's FASTA splitter when you need a browser-local, no-code workflow with upload or paste input, record-preserving output, summary files, visible warnings, and convenient downloads. For changing file formats rather than splitting existing FASTA records, use FASTQ to FASTA for sequencing reads, CSV to FASTA for sequence tables, TXT to FASTA for raw sequence text, or GenBank to FASTA for annotated GenBank records.
FAQ
Does FASTA Splitter change my sequences?
It preserves sequence characters except for whitespace inside sequence lines, which is removed because whitespace is not part of the biological sequence. Header lines are preserved by default. If Preserve original headers is turned off, the output uses generated headers such as Sequence_1.
Can I split one long sequence into smaller fragments?
No. FASTA Splitter divides a multi-record FASTA file into smaller files, but it does not fragment an individual record. A sequence longer than the Max residues/bases per file setting is kept as one complete record and reported with a warning.
Why did I get fewer files than requested?
Split into number of files creates non-empty files only. If you request 20 files from 8 sequences, the tool creates 8 files and returns a warning.
What FASTA formats are supported?
The splitter accepts standard FASTA records for protein, DNA, and RNA sequences. Uploaded filenames can use .txt, .fasta, .fa, or .fas; output extension preservation also recognizes .fna and .faa when those extensions are present in the uploaded filename.
Are my sequences uploaded to a server?
No. This FASTA splitter runs locally in your browser. The sequence text is processed client-side and the output files are generated from that local result.
Sources
- NCBI GenBank FASTA format
- NCBI BLAST FASTA input documentation
- SeqKit usage documentation
- Shen W, Le S, Li Y, Hu F. "SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation." PLOS ONE 11(10), e0163962 (2016). doi:10.1371/journal.pone.0163962
Related tools

PDB2PQR
PDB2PQR prepares protein structures for electrostatics calculations by adding missing atoms, predicting protonation states using PROPKA, and assigning atomic charges and radii from standard force fields.

Filter protein
Clean and filter protein sequences by removing or replacing non-standard amino acid characters. Supports multiple filter modes including standard 20 amino acids, IUPAC codes, and custom character sets.

Filter DNA
Clean and filter DNA sequences by removing or replacing non-standard nucleotide characters. Supports multiple filter modes including standard 4 bases, IUPAC ambiguity codes, and custom character sets.

Ligand fixer
Fix ligand files that fail RDKit, Meeko, or docking preparation. Repair SDF, MOL, and MOL2 inputs, apply safe chemistry cleanup, and export docking-ready SDF files.

CSV to FASTA
Convert CSV and TSV files containing sequence data to FASTA format with flexible column mapping and automatic delimiter detection

TXT to FASTA converter
Convert TXT or plain text sequences into FASTA format files for DNA, RNA, and protein workflows with cleanup, validation, and downloads

GenBank Feature Extractor
Extract sequence features (CDS, mRNA, gene, etc.) from GenBank files in FASTA format with support for spliced features

FASTA to FASTQ Converter
Convert FASTA sequence files to FASTQ format with mock quality scores

FASTQ to FASTA converter
Convert standard FASTQ reads to FASTA with validation, IUPAC nucleotide support, average-quality filtering, and downloadable summaries

GenBank to FASTA Converter
Convert GenBank files to FASTA format
