ProteinIQ
CSV to FASTA icon

CSV to FASTA

Convert spreadsheet data to FASTA format. Upload a CSV file or paste your data below. Configure column mapping and formatting options on the right.

What is CSV to FASTA?

CSV to FASTA converts tabular sequence data—stored in CSV, TSV, or other delimited formats—into FASTA format. Spreadsheets and databases often store sequences alongside metadata in columns, but most bioinformatics tools expect FASTA input. This converter bridges that gap.

The tool handles common spreadsheet quirks: quoted fields containing commas, inconsistent delimiters, and extraneous characters mixed into sequence columns. It auto-detects column names and delimiters, though manual overrides are available when needed.

How to use CSV to FASTA online

ProteinIQ's converter runs entirely in the browser—no uploads to external servers, no software installation. Large files process instantly on the client.

Input

FormatDescription
CSVComma-separated values
TSVTab-separated values
Other delimitedSemicolon or pipe-delimited files

The converter expects at least two columns: one for sequence identifiers (headers) and one for the sequences themselves. Additional metadata columns are ignored.

Settings

CSV parsing

SettingDescription
DelimiterAuto-detect (default), Comma, Semicolon, Tab, or Pipe. Auto-detection examines the first few rows.
Has header rowWhether the first row contains column names. Default: on.
Quote characterCharacter used to wrap fields containing delimiters. " (default), ', or None.

Column mapping

The converter looks for common column names automatically: "id", "name", "header" for identifiers; "sequence", "seq", "protein" for sequences. Override these when your columns use different names.

SettingDescription
Use column indicesMap by position (0-indexed) instead of column name. Useful for headerless files.
ID column nameColumn containing sequence identifiers. Default: id.
Sequence column nameColumn containing sequences. Default: sequence.
ID column indexZero-based position of the ID column (when using indices).
Sequence column indexZero-based position of the sequence column (when using indices).

FASTA formatting

SettingDescription
Line wrapping80 characters (default), 60, or no wrapping. Standard FASTA uses 60–80.
Case formatUPPERCASE (default), lowercase, or Preserve original.
Header prefixOptional text prepended to all FASTA headers.

Sequence cleanup

Spreadsheet data often contains formatting artifacts—position numbers, spaces for readability, or stray punctuation. The cleanup options remove these automatically.

SettingDescription
Character cleanupMaster toggle for cleanup options. Default: on.
Remove spacesStrip whitespace. Default: on.
Remove numbersStrip digits 0–9. Default: on.
Remove punctuationStrip punctuation marks. Default: on.
Remove invalid charactersStrip characters not valid in biological sequences. Default: on.

Validation

SettingDescription
Validate sequencesCheck for invalid characters after cleanup. Default: on.
Skip empty rowsIgnore rows with missing sequence values. Default: on.
Skip invalid sequencesExclude sequences that fail validation instead of including them with warnings. Default: off.
Show sequence statisticsDisplay count, total length, and detected sequence type. Default: on.

Output

The converter produces standard FASTA with headers beginning with > followed by the identifier, then sequence lines wrapped at the specified length.

1>seq12ATCGATCGATCGATCGATCG3>seq24GCTAGCTAGCTAGCTAGCTA

Results can be copied to clipboard or downloaded as a .fasta file.