Guides / Statistics

E. coli statistics

Dr. Matic Broz·April 29, 2026

E. coli statistics

Key takeaways

  • Reference genome: E. coli K-12 MG1655 has a 4,641,652 bp circular chromosome in RefSeq NC_000913.3 (NCBI, updated December 2025)
  • Current NCBI annotation for GCA_000005845.2 lists 4,651 total genes, including 4,290 protein-coding genes and 145 pseudogenes (release date November 2024)
  • UniProt proteome UP000000625 lists 4,403 proteins for E. coli strain K-12 MG1655 (modified September 2025)
  • Typical E. coli cell scale: about 1 um diameter, 2 um length, 1 um^3 volume, and 1 pg mass (Cell Biology by the Numbers)
  • Pathogenic E. coli is not one disease: CDC describes six diarrheagenic groups, including STEC, ETEC, EPEC, EIEC, EAEC, and DAEC (May 2024)

As of April 2026, the most useful single reference point is Escherichia coli K-12 MG1655: a 4,641,652 bp circular chromosome with about 4,300 to 4,400 protein-coding genes or proteins, depending on whether you read NCBI, EcoCyc, or UniProt. That number matters because E. coli is both a standard laboratory organism and a broad bacterial species with harmless gut strains, pathogenic strains, and hundreds of thousands of genome assemblies in public databases.

The main caveat is that there is no single universal E. coli statistic. Genome size, gene count, mass, ribosome count, growth rate, and disease burden all depend on the strain, growth condition, and database or surveillance method.

The reference E. coli genome is 4.64 million base pairs

The RefSeq record NC_000913.3 lists the E. coli K-12 MG1655 chromosome as 4,641,652 bp of circular DNA.

NCBI's ESummary record identifies the molecule as genomic DNA, topology circular, genome chromosome, and organism Escherichia coli str. K-12 substr. MG1655. NCBI Datasets reports the same total sequence length for assembly GCA_000005845.2 / GCF_000005845.2, with 1 chromosome, 1 contig, and 51% GC.

Reference genome statisticValueSource context
RefSeq accessionNC_000913.3NCBI ESummary, updated 2025-12-09
Assembly accessionGCA_000005845.2 / GCF_000005845.2NCBI Datasets
Chromosome length4,641,652 bpNCBI ESummary and Datasets
Chromosome topologyCircularNCBI ESummary
GC content51%NCBI Datasets
Assembly levelComplete genomeNCBI Datasets

Sources: NCBI NC_000913.3 ESummary and NCBI Datasets GCA_000005845.2 report, accessed April 29, 2026.

Annotation counts differ by database

Current databases put the E. coli K-12 MG1655 coding inventory in the range of about 4,290 to 4,403 protein-coding genes or proteins.

The differences are not necessarily errors. NCBI Datasets reports assembly annotation counts, EcoCyc reports a curated model-organism database, and UniProt reports a reference proteome. Those resources count related but non-identical objects.

SourceStrain or recordReported countWhat it countsFreshness cue
NCBI DatasetsGCA_000005845.24,651Total genesAnnotation release 2024-11-06
NCBI DatasetsGCA_000005845.24,290Protein-coding genesAnnotation release 2024-11-06
NCBI DatasetsGCA_000005845.2215Non-coding genesAnnotation release 2024-11-06
NCBI DatasetsGCA_000005845.2145PseudogenesAnnotation release 2024-11-06
EcoCycK-12 MG1655 v29.64,543GenesPage generated 2026-04-29
EcoCycK-12 MG1655 v29.64,313Protein genesPage generated 2026-04-29
EcoCycK-12 MG1655 v29.6229RNA genesPage generated 2026-04-29
EcoCycK-12 MG1655 v29.6145PseudogenesPage generated 2026-04-29
UniProtUP0000006254,403ProteinsModified 2025-09-19

Sources: NCBI Datasets genome report, EcoCyc organism summary, and UniProt UP000000625.

The original 1997 Science genome paper reported a 4,639,221 bp sequence with 4,288 protein-coding genes. Modern records are slightly different because the reference sequence and annotations have been updated since the first complete K-12 genome publication.

The E. coli chromosome is about 1.6 millimeters long when stretched

If the K-12 MG1655 chromosome is converted from base pairs to physical DNA length, it is about 1.58 mm long when fully extended.

This is a derived value. It uses the NCBI chromosome length of 4,641,652 bp and the standard B-DNA rise of about 0.34 nm per base pair:

CalculationInputResult
Chromosome length4,641,652 bpReported by NCBI
DNA rise per base pair0.34 nm/bpStandard B-DNA value used in DNA-length calculations
Fully extended DNA length4,641,652 x 0.34 nm1,578,162 nm
Fully extended DNA length1,578,162 nm~1.58 mm

Compared with a typical 2 um cell length, the fully stretched chromosome is roughly 790 times longer than the cell. That ratio is approximate because E. coli cell size changes with growth condition.

For comparison, a human diploid genome is about 2.06 meters when fully extended, as covered in the DNA length guide. E. coli DNA is far shorter in absolute length, but it is still much longer than the cell that packs it.

A typical E. coli cell is about 2 micrometers long

A common rule of thumb is that an E. coli cell is about 1 um in diameter, 2 um long, 1 um^3 in volume, and 1 pg in mass.

Those values are rounded biological reference numbers, not constants. Cell Biology by the Numbers notes that cell size and mass change strongly with growth rate: faster-growing cells are larger.

Cell-scale statisticTypical valueImportant caveat
Diameter~1 umRule-of-thumb value
Length~2 umVaries by strain and growth condition
Volume~1 um^3, or 1 fLA spherocylinder estimate gives ~1.3 um^3 for 1 um x 2 um geometry
Mass~1 pgAssumes cell density close to water
Dry mass at slow growth148 fgReported for B/r cells around 100-min doubling time
Dry mass at fast growth865 fgReported for B/r cells around 24-min doubling time

Source: Cell Biology by the Numbers, How big is an E. coli cell and what is its mass?

The dry-mass values show why a single cell-size number should be treated as a reference scale. A rapidly dividing E. coli cell can have more than 5 times the dry mass of a slower-growing cell in the classic bacterial physiology data summarized by Cell Biology by the Numbers.

E. coli can divide in about 20 minutes under ideal conditions

The minimum division time often quoted for E. coli is about 20 minutes under ideal laboratory conditions.

That fast doubling time creates an apparent timing problem: copying a roughly 4.6 Mbp genome with two replication forks takes on the order of 40 minutes, longer than the fastest cell-division cycle. E. coli resolves this by initiating overlapping rounds of DNA replication before the previous round is complete.

Growth and replication statisticValueSource context
Fast division time~20 minIdeal growth-condition estimate
Genome replication time~40 minTwo forks copying a ~5 Mbp bacterial genome
In vivo replication-rate estimate~600 bp/sRecent single-molecule microscopy summary in Cell Biology by the Numbers
Origins in fast-growing cells>6Overlapping replication rounds
Replication forks in fast-growing cells>10Overlapping replication rounds

Source: Cell Biology by the Numbers, How long does it take cells to copy their genomes?

These are physiology numbers, not species-wide constants. Doubling time changes with medium composition, temperature, aeration, strain background, stress, and experimental setup.

One fast-growing E. coli cell can contain tens of thousands of ribosomes

In fast growth, E. coli can contain about 72,000 ribosomes per cell.

Ribosome abundance is one of the clearest quantitative signatures of bacterial growth rate. Cell Biology by the Numbers summarizes classic data in which a fast 24-minute doubling time corresponds to about 72,000 ribosomes per cell, while slower growth has far fewer ribosomes.

Molecular statisticValueMeaning
Ribosomes at fast growth~72,000 per cellApproximate count at 24-min doubling time
RNA polymerase molecules~1,000-10,000 per cellRange summarized for E. coli
Transcription elongation speed~40-80 nt/sMaximum speed range in E. coli
Translation elongation speed~20 aa/sMaximum speed range in E. coli

Sources: Cell Biology by the Numbers, How many ribosomes are in a cell? and What is faster, transcription or translation?

The transcription and translation speeds are nearly matched in coding terms because 3 nucleotides encode 1 amino acid. A transcript growing at 60 nt/s corresponds to coding capacity for roughly 20 aa/s, close to the quoted bacterial translation speed.

E. coli has millions of protein molecules but about 4,400 annotated proteins

An E. coli cell has roughly 2 to 4 million protein molecules per 1 um^3 cell, but the K-12 MG1655 reference proteome has 4,403 proteins in UniProt.

Those are different kinds of counts. Protein molecule counts ask how many physical protein copies are in one cell. Proteome counts ask how many distinct protein entries are annotated for a strain.

Protein statisticValueWhat it means
Total protein molecules in a 1 um^3 E. coli cell~2-4 millionEstimated physical molecules per cell
Average bacterial protein length used in the estimate~300 amino acidsAbundance-weighted rounded estimate
Protein concentration used for E. coli estimate~0.19-0.24 g/mlProtein mass per cell volume range discussed by Cell Biology by the Numbers
UniProt K-12 MG1655 reference proteome4,403 proteinsDistinct protein entries in UP000000625

Sources: Cell Biology by the Numbers, How many proteins are in a cell? and UniProt UP000000625.

This same distinction appears in broader protein statistics: known sequence databases, gene annotations, translated products, and molecule counts answer different questions. The protein count guide uses the same separation for human and global protein counts.

E. coli has a very low mutation rate per copied base

Long-term evolution experiments put the E. coli mutation rate on the order of 10-10 mutations per base pair per replication under the measured conditions.

Cell Biology by the Numbers summarizes sequencing from Richard Lenski's long-term evolution experiment, where fixed mutations over tens of thousands of generations supported an estimated mutation rate around 10-10 mutations/bp/replication. For a roughly 5 x 106 bp bacterial genome, that corresponds to about 5 x 10-4 mutations per genome replication, or roughly 1 mutation per 1,000 generations in a single lineage under that simplified calculation.

This does not mean mutations are rare in a large culture. A culture with more than 109 cells/ml can contain many mutant lineages even when the per-base replication error rate is low.

Source: Cell Biology by the Numbers, What is the mutation rate during genome replication?

Public databases contain hundreds of thousands of E. coli genome assemblies

NCBI Datasets returned 460,707 genome assembly reports for taxon 562 (Escherichia coli) when queried on April 29, 2026.

That live database count is not a biological count of strains in nature. It is a public-data count shaped by sequencing effort, clinical surveillance, environmental sampling, food-safety testing, outbreak investigation, and database inclusion rules.

Database statisticValueSource context
NCBI taxon ID562Escherichia coli species
Genome assembly reports returned460,707NCBI Datasets API, accessed 2026-04-29
First returned assembly in queryGCA_000005845.2K-12 MG1655 reference genome
CheckM completeness for first returned assembly99.48%NCBI Datasets report for GCA_000005845.2
CheckM contamination for first returned assembly0.15%NCBI Datasets report for GCA_000005845.2

Source: NCBI Datasets taxon 562 genome report API, accessed April 29, 2026.

The large assembly count is one reason E. coli is useful for comparative genomics. It is also why the K-12 reference should not be treated as a complete description of every E. coli strain.

E. coli is one species, but pathogenic E. coli is not one disease

CDC describes six diarrheagenic kinds of E. coli: STEC, ETEC, EPEC, EIEC, EAEC, and DAEC.

Most E. coli are harmless members of the intestinal tract, but some strains cause diarrhea, urinary tract infections, pneumonia, sepsis, and other illnesses. CDC's public pages focus on diarrheagenic E. coli, where symptoms, risk groups, and likely exposure routes differ by pathotype.

CDC groupFull nameCommon symptom pattern or context
STECShiga toxin-producing E. coliBloody diarrhea, severe stomach cramps, vomiting; major HUS risk
ETECEnterotoxigenic E. coliWatery diarrhea; leading cause of travelers' diarrhea
EPECEnteropathogenic E. coliWatery diarrhea, especially in children younger than 1 year
EIECEnteroinvasive E. coliWatery diarrhea that is sometimes bloody; fever
EAECEnteroaggregative E. coliWatery diarrhea, sometimes lasting more than 2 weeks
DAECDiffusely adherent E. coliWatery diarrhea, especially in children aged 3-5 years

Sources: CDC, About Escherichia coli Infection and Kinds of E. coli, both dated May 14, 2024.

This classification matters for statistics. A count for STEC O157 is not interchangeable with all STEC, all diarrheagenic E. coli, urinary-pathogenic E. coli, or harmless gut E. coli.

U.S. foodborne E. coli burden estimates are model-based

CDC's 2011 Emerging Infectious Diseases burden paper estimated 63,153 annual U.S. domestically acquired foodborne illnesses from STEC O157 and 112,752 from non-O157 STEC, using data and models based on the 2006 U.S. population.

Those figures are not direct case counts. They combine surveillance data with underreporting, underdiagnosis, travel-related, and foodborne-attribution adjustments. They remain useful because the table makes the modeling assumptions visible.

E. coli category in Scallan et al.Estimated annual U.S. domestically acquired foodborne illnessesEstimated annual U.S. domestically acquired foodborne hospitalizationsEstimated annual U.S. domestically acquired foodborne deaths
STEC O15763,1532,13820
STEC non-O157112,7522710 median reported
ETEC17,894120
Diarrheagenic E. coli other than STEC and ETEC11,98280
Combined rows above205,7812,42920

Sources: Scallan et al., 2011, CDC Emerging Infectious Diseases Table 2 and Table 3.

CDC's current E. coli surveillance page says national STEC surveillance data are collected through passive surveillance of laboratory-confirmed human STEC isolates, with reports collected into the Laboratory-based Enteric Disease Surveillance system. That surveillance framework is different from the modeled annual burden estimates above.

Source: CDC, E. coli Surveillance, dated February 28, 2024.

EcoCyc adds pathway and regulation context for K-12

EcoCyc v29.6 reports 478 pathways, 2,419 enzymatic reactions, 549 transport reactions, and 70,767 GO annotations for E. coli K-12 MG1655.

These are knowledge-base statistics rather than genome-sequence statistics. They describe how much curated functional information EcoCyc attaches to the model organism.

EcoCyc v29.6 database contentCount
Genes4,543
Pathways478
Enzymatic reactions2,419
Transport reactions549
Polypeptides4,494
Protein complexes1,210
Enzymes1,458
Transporters488
Compounds3,086
Transcription units3,767
tRNAs89
Growth media441
Transcriptional regulation entries6,106
Protein features46,219
GO annotations70,767

Source: EcoCyc organism summary for E. coli K-12 MG1655, database version 29.6, page generated April 29, 2026.

EcoCyc also notes that K-12 MG1655 is the strain from which the first complete E. coli genome sequence was obtained. That makes it the right reference strain for many basic statistics, but not a substitute for pathogenic-strain-specific measurements.

Methodology

This article separates directly reported database values from derived calculations.

  1. The 4,641,652 bp chromosome length, circular topology, and NC_000913.3 accession are directly reported by NCBI ESummary for the RefSeq nucleotide record.
  2. The NCBI gene-count table uses the first NCBI Datasets genome report returned for taxon 562, assembly GCA_000005845.2 / GCF_000005845.2. Its annotation release date is 2024-11-06.
  3. EcoCyc counts come from the EcoCyc v29.6 organism summary, which was generated by Pathway Tools on 2026-04-29 during research for this article.
  4. UniProt protein counts come from the UniProt proteome API for UP000000625, modified 2025-09-19.
  5. The 1.58 mm stretched chromosome length is derived as 4,641,652 bp x 0.34 nm/bp = 1,578,161.68 nm = 1.58 mm.
  6. The DNA-to-cell-length ratio is derived as 1.58 mm / 2 um = 790, using the rounded Cell Biology by the Numbers cell-length rule of thumb.
  7. The combined U.S. foodborne E. coli illness, hospitalization, and death values sum the four E. coli rows shown in Scallan et al. 2011: STEC O157, STEC non-O157, ETEC, and diarrheagenic E. coli other than STEC and ETEC. This combined row is a ProteinIQ calculation from the published table, not a separately reported CDC headline.
  8. Public-health burden estimates use Scallan et al. 2011 because the request was for statistics broadly, but those estimates are model-based and tied to the paper's methods and 2006 U.S. population denominator.

Sources

Matic Broz

Matic Broz

Founder & CEO, ProteinIQ

Matic founded ProteinIQ to make computational biology accessible to every researcher. He builds code-free bioinformatics tools used by thousands of scientists worldwide for protein analysis, molecular docking, and drug discovery.