Key takeaways
- Reference genome: E. coli K-12 MG1655 has a 4,641,652 bp circular chromosome in RefSeq NC_000913.3 (NCBI, updated December 2025)
- Current NCBI annotation for GCA_000005845.2 lists 4,651 total genes, including 4,290 protein-coding genes and 145 pseudogenes (release date November 2024)
- UniProt proteome UP000000625 lists 4,403 proteins for E. coli strain K-12 MG1655 (modified September 2025)
- Typical E. coli cell scale: about 1 um diameter, 2 um length, 1 um^3 volume, and 1 pg mass (Cell Biology by the Numbers)
- Pathogenic E. coli is not one disease: CDC describes six diarrheagenic groups, including STEC, ETEC, EPEC, EIEC, EAEC, and DAEC (May 2024)
As of April 2026, the most useful single reference point is Escherichia coli K-12 MG1655: a 4,641,652 bp circular chromosome with about 4,300 to 4,400 protein-coding genes or proteins, depending on whether you read NCBI, EcoCyc, or UniProt. That number matters because E. coli is both a standard laboratory organism and a broad bacterial species with harmless gut strains, pathogenic strains, and hundreds of thousands of genome assemblies in public databases.
The main caveat is that there is no single universal E. coli statistic. Genome size, gene count, mass, ribosome count, growth rate, and disease burden all depend on the strain, growth condition, and database or surveillance method.
The reference E. coli genome is 4.64 million base pairs
The RefSeq record NC_000913.3 lists the E. coli K-12 MG1655 chromosome as 4,641,652 bp of circular DNA.
NCBI's ESummary record identifies the molecule as genomic DNA, topology circular, genome chromosome, and organism Escherichia coli str. K-12 substr. MG1655. NCBI Datasets reports the same total sequence length for assembly GCA_000005845.2 / GCF_000005845.2, with 1 chromosome, 1 contig, and 51% GC.
| Reference genome statistic | Value | Source context |
|---|---|---|
| RefSeq accession | NC_000913.3 | NCBI ESummary, updated 2025-12-09 |
| Assembly accession | GCA_000005845.2 / GCF_000005845.2 | NCBI Datasets |
| Chromosome length | 4,641,652 bp | NCBI ESummary and Datasets |
| Chromosome topology | Circular | NCBI ESummary |
| GC content | 51% | NCBI Datasets |
| Assembly level | Complete genome | NCBI Datasets |
Sources: NCBI NC_000913.3 ESummary and NCBI Datasets GCA_000005845.2 report, accessed April 29, 2026.
Annotation counts differ by database
Current databases put the E. coli K-12 MG1655 coding inventory in the range of about 4,290 to 4,403 protein-coding genes or proteins.
The differences are not necessarily errors. NCBI Datasets reports assembly annotation counts, EcoCyc reports a curated model-organism database, and UniProt reports a reference proteome. Those resources count related but non-identical objects.
| Source | Strain or record | Reported count | What it counts | Freshness cue |
|---|---|---|---|---|
| NCBI Datasets | GCA_000005845.2 | 4,651 | Total genes | Annotation release 2024-11-06 |
| NCBI Datasets | GCA_000005845.2 | 4,290 | Protein-coding genes | Annotation release 2024-11-06 |
| NCBI Datasets | GCA_000005845.2 | 215 | Non-coding genes | Annotation release 2024-11-06 |
| NCBI Datasets | GCA_000005845.2 | 145 | Pseudogenes | Annotation release 2024-11-06 |
| EcoCyc | K-12 MG1655 v29.6 | 4,543 | Genes | Page generated 2026-04-29 |
| EcoCyc | K-12 MG1655 v29.6 | 4,313 | Protein genes | Page generated 2026-04-29 |
| EcoCyc | K-12 MG1655 v29.6 | 229 | RNA genes | Page generated 2026-04-29 |
| EcoCyc | K-12 MG1655 v29.6 | 145 | Pseudogenes | Page generated 2026-04-29 |
| UniProt | UP000000625 | 4,403 | Proteins | Modified 2025-09-19 |
Sources: NCBI Datasets genome report, EcoCyc organism summary, and UniProt UP000000625.
The original 1997 Science genome paper reported a 4,639,221 bp sequence with 4,288 protein-coding genes. Modern records are slightly different because the reference sequence and annotations have been updated since the first complete K-12 genome publication.
The E. coli chromosome is about 1.6 millimeters long when stretched
If the K-12 MG1655 chromosome is converted from base pairs to physical DNA length, it is about 1.58 mm long when fully extended.
This is a derived value. It uses the NCBI chromosome length of 4,641,652 bp and the standard B-DNA rise of about 0.34 nm per base pair:
| Calculation | Input | Result |
|---|---|---|
| Chromosome length | 4,641,652 bp | Reported by NCBI |
| DNA rise per base pair | 0.34 nm/bp | Standard B-DNA value used in DNA-length calculations |
| Fully extended DNA length | 4,641,652 x 0.34 nm | 1,578,162 nm |
| Fully extended DNA length | 1,578,162 nm | ~1.58 mm |
Compared with a typical 2 um cell length, the fully stretched chromosome is roughly 790 times longer than the cell. That ratio is approximate because E. coli cell size changes with growth condition.
For comparison, a human diploid genome is about 2.06 meters when fully extended, as covered in the DNA length guide. E. coli DNA is far shorter in absolute length, but it is still much longer than the cell that packs it.
A typical E. coli cell is about 2 micrometers long
A common rule of thumb is that an E. coli cell is about 1 um in diameter, 2 um long, 1 um^3 in volume, and 1 pg in mass.
Those values are rounded biological reference numbers, not constants. Cell Biology by the Numbers notes that cell size and mass change strongly with growth rate: faster-growing cells are larger.
| Cell-scale statistic | Typical value | Important caveat |
|---|---|---|
| Diameter | ~1 um | Rule-of-thumb value |
| Length | ~2 um | Varies by strain and growth condition |
| Volume | ~1 um^3, or 1 fL | A spherocylinder estimate gives ~1.3 um^3 for 1 um x 2 um geometry |
| Mass | ~1 pg | Assumes cell density close to water |
| Dry mass at slow growth | 148 fg | Reported for B/r cells around 100-min doubling time |
| Dry mass at fast growth | 865 fg | Reported for B/r cells around 24-min doubling time |
Source: Cell Biology by the Numbers, How big is an E. coli cell and what is its mass?
The dry-mass values show why a single cell-size number should be treated as a reference scale. A rapidly dividing E. coli cell can have more than 5 times the dry mass of a slower-growing cell in the classic bacterial physiology data summarized by Cell Biology by the Numbers.
E. coli can divide in about 20 minutes under ideal conditions
The minimum division time often quoted for E. coli is about 20 minutes under ideal laboratory conditions.
That fast doubling time creates an apparent timing problem: copying a roughly 4.6 Mbp genome with two replication forks takes on the order of 40 minutes, longer than the fastest cell-division cycle. E. coli resolves this by initiating overlapping rounds of DNA replication before the previous round is complete.
| Growth and replication statistic | Value | Source context |
|---|---|---|
| Fast division time | ~20 min | Ideal growth-condition estimate |
| Genome replication time | ~40 min | Two forks copying a ~5 Mbp bacterial genome |
| In vivo replication-rate estimate | ~600 bp/s | Recent single-molecule microscopy summary in Cell Biology by the Numbers |
| Origins in fast-growing cells | >6 | Overlapping replication rounds |
| Replication forks in fast-growing cells | >10 | Overlapping replication rounds |
Source: Cell Biology by the Numbers, How long does it take cells to copy their genomes?
These are physiology numbers, not species-wide constants. Doubling time changes with medium composition, temperature, aeration, strain background, stress, and experimental setup.
One fast-growing E. coli cell can contain tens of thousands of ribosomes
In fast growth, E. coli can contain about 72,000 ribosomes per cell.
Ribosome abundance is one of the clearest quantitative signatures of bacterial growth rate. Cell Biology by the Numbers summarizes classic data in which a fast 24-minute doubling time corresponds to about 72,000 ribosomes per cell, while slower growth has far fewer ribosomes.
| Molecular statistic | Value | Meaning |
|---|---|---|
| Ribosomes at fast growth | ~72,000 per cell | Approximate count at 24-min doubling time |
| RNA polymerase molecules | ~1,000-10,000 per cell | Range summarized for E. coli |
| Transcription elongation speed | ~40-80 nt/s | Maximum speed range in E. coli |
| Translation elongation speed | ~20 aa/s | Maximum speed range in E. coli |
Sources: Cell Biology by the Numbers, How many ribosomes are in a cell? and What is faster, transcription or translation?
The transcription and translation speeds are nearly matched in coding terms because 3 nucleotides encode 1 amino acid. A transcript growing at 60 nt/s corresponds to coding capacity for roughly 20 aa/s, close to the quoted bacterial translation speed.
E. coli has millions of protein molecules but about 4,400 annotated proteins
An E. coli cell has roughly 2 to 4 million protein molecules per 1 um^3 cell, but the K-12 MG1655 reference proteome has 4,403 proteins in UniProt.
Those are different kinds of counts. Protein molecule counts ask how many physical protein copies are in one cell. Proteome counts ask how many distinct protein entries are annotated for a strain.
| Protein statistic | Value | What it means |
|---|---|---|
| Total protein molecules in a 1 um^3 E. coli cell | ~2-4 million | Estimated physical molecules per cell |
| Average bacterial protein length used in the estimate | ~300 amino acids | Abundance-weighted rounded estimate |
| Protein concentration used for E. coli estimate | ~0.19-0.24 g/ml | Protein mass per cell volume range discussed by Cell Biology by the Numbers |
| UniProt K-12 MG1655 reference proteome | 4,403 proteins | Distinct protein entries in UP000000625 |
Sources: Cell Biology by the Numbers, How many proteins are in a cell? and UniProt UP000000625.
This same distinction appears in broader protein statistics: known sequence databases, gene annotations, translated products, and molecule counts answer different questions. The protein count guide uses the same separation for human and global protein counts.
E. coli has a very low mutation rate per copied base
Long-term evolution experiments put the E. coli mutation rate on the order of 10-10 mutations per base pair per replication under the measured conditions.
Cell Biology by the Numbers summarizes sequencing from Richard Lenski's long-term evolution experiment, where fixed mutations over tens of thousands of generations supported an estimated mutation rate around 10-10 mutations/bp/replication. For a roughly 5 x 106 bp bacterial genome, that corresponds to about 5 x 10-4 mutations per genome replication, or roughly 1 mutation per 1,000 generations in a single lineage under that simplified calculation.
This does not mean mutations are rare in a large culture. A culture with more than 109 cells/ml can contain many mutant lineages even when the per-base replication error rate is low.
Source: Cell Biology by the Numbers, What is the mutation rate during genome replication?
Public databases contain hundreds of thousands of E. coli genome assemblies
NCBI Datasets returned 460,707 genome assembly reports for taxon 562 (Escherichia coli) when queried on April 29, 2026.
That live database count is not a biological count of strains in nature. It is a public-data count shaped by sequencing effort, clinical surveillance, environmental sampling, food-safety testing, outbreak investigation, and database inclusion rules.
| Database statistic | Value | Source context |
|---|---|---|
| NCBI taxon ID | 562 | Escherichia coli species |
| Genome assembly reports returned | 460,707 | NCBI Datasets API, accessed 2026-04-29 |
| First returned assembly in query | GCA_000005845.2 | K-12 MG1655 reference genome |
| CheckM completeness for first returned assembly | 99.48% | NCBI Datasets report for GCA_000005845.2 |
| CheckM contamination for first returned assembly | 0.15% | NCBI Datasets report for GCA_000005845.2 |
Source: NCBI Datasets taxon 562 genome report API, accessed April 29, 2026.
The large assembly count is one reason E. coli is useful for comparative genomics. It is also why the K-12 reference should not be treated as a complete description of every E. coli strain.
E. coli is one species, but pathogenic E. coli is not one disease
CDC describes six diarrheagenic kinds of E. coli: STEC, ETEC, EPEC, EIEC, EAEC, and DAEC.
Most E. coli are harmless members of the intestinal tract, but some strains cause diarrhea, urinary tract infections, pneumonia, sepsis, and other illnesses. CDC's public pages focus on diarrheagenic E. coli, where symptoms, risk groups, and likely exposure routes differ by pathotype.
| CDC group | Full name | Common symptom pattern or context |
|---|---|---|
| STEC | Shiga toxin-producing E. coli | Bloody diarrhea, severe stomach cramps, vomiting; major HUS risk |
| ETEC | Enterotoxigenic E. coli | Watery diarrhea; leading cause of travelers' diarrhea |
| EPEC | Enteropathogenic E. coli | Watery diarrhea, especially in children younger than 1 year |
| EIEC | Enteroinvasive E. coli | Watery diarrhea that is sometimes bloody; fever |
| EAEC | Enteroaggregative E. coli | Watery diarrhea, sometimes lasting more than 2 weeks |
| DAEC | Diffusely adherent E. coli | Watery diarrhea, especially in children aged 3-5 years |
Sources: CDC, About Escherichia coli Infection and Kinds of E. coli, both dated May 14, 2024.
This classification matters for statistics. A count for STEC O157 is not interchangeable with all STEC, all diarrheagenic E. coli, urinary-pathogenic E. coli, or harmless gut E. coli.
U.S. foodborne E. coli burden estimates are model-based
CDC's 2011 Emerging Infectious Diseases burden paper estimated 63,153 annual U.S. domestically acquired foodborne illnesses from STEC O157 and 112,752 from non-O157 STEC, using data and models based on the 2006 U.S. population.
Those figures are not direct case counts. They combine surveillance data with underreporting, underdiagnosis, travel-related, and foodborne-attribution adjustments. They remain useful because the table makes the modeling assumptions visible.
| E. coli category in Scallan et al. | Estimated annual U.S. domestically acquired foodborne illnesses | Estimated annual U.S. domestically acquired foodborne hospitalizations | Estimated annual U.S. domestically acquired foodborne deaths |
|---|---|---|---|
| STEC O157 | 63,153 | 2,138 | 20 |
| STEC non-O157 | 112,752 | 271 | 0 median reported |
| ETEC | 17,894 | 12 | 0 |
| Diarrheagenic E. coli other than STEC and ETEC | 11,982 | 8 | 0 |
| Combined rows above | 205,781 | 2,429 | 20 |
Sources: Scallan et al., 2011, CDC Emerging Infectious Diseases Table 2 and Table 3.
CDC's current E. coli surveillance page says national STEC surveillance data are collected through passive surveillance of laboratory-confirmed human STEC isolates, with reports collected into the Laboratory-based Enteric Disease Surveillance system. That surveillance framework is different from the modeled annual burden estimates above.
Source: CDC, E. coli Surveillance, dated February 28, 2024.
EcoCyc adds pathway and regulation context for K-12
EcoCyc v29.6 reports 478 pathways, 2,419 enzymatic reactions, 549 transport reactions, and 70,767 GO annotations for E. coli K-12 MG1655.
These are knowledge-base statistics rather than genome-sequence statistics. They describe how much curated functional information EcoCyc attaches to the model organism.
| EcoCyc v29.6 database content | Count |
|---|---|
| Genes | 4,543 |
| Pathways | 478 |
| Enzymatic reactions | 2,419 |
| Transport reactions | 549 |
| Polypeptides | 4,494 |
| Protein complexes | 1,210 |
| Enzymes | 1,458 |
| Transporters | 488 |
| Compounds | 3,086 |
| Transcription units | 3,767 |
| tRNAs | 89 |
| Growth media | 441 |
| Transcriptional regulation entries | 6,106 |
| Protein features | 46,219 |
| GO annotations | 70,767 |
Source: EcoCyc organism summary for E. coli K-12 MG1655, database version 29.6, page generated April 29, 2026.
EcoCyc also notes that K-12 MG1655 is the strain from which the first complete E. coli genome sequence was obtained. That makes it the right reference strain for many basic statistics, but not a substitute for pathogenic-strain-specific measurements.
Methodology
This article separates directly reported database values from derived calculations.
- The 4,641,652 bp chromosome length, circular topology, and NC_000913.3 accession are directly reported by NCBI ESummary for the RefSeq nucleotide record.
- The NCBI gene-count table uses the first NCBI Datasets genome report returned for taxon 562, assembly GCA_000005845.2 / GCF_000005845.2. Its annotation release date is 2024-11-06.
- EcoCyc counts come from the EcoCyc v29.6 organism summary, which was generated by Pathway Tools on 2026-04-29 during research for this article.
- UniProt protein counts come from the UniProt proteome API for UP000000625, modified 2025-09-19.
- The 1.58 mm stretched chromosome length is derived as 4,641,652 bp x 0.34 nm/bp = 1,578,161.68 nm = 1.58 mm.
- The DNA-to-cell-length ratio is derived as 1.58 mm / 2 um = 790, using the rounded Cell Biology by the Numbers cell-length rule of thumb.
- The combined U.S. foodborne E. coli illness, hospitalization, and death values sum the four E. coli rows shown in Scallan et al. 2011: STEC O157, STEC non-O157, ETEC, and diarrheagenic E. coli other than STEC and ETEC. This combined row is a ProteinIQ calculation from the published table, not a separately reported CDC headline.
- Public-health burden estimates use Scallan et al. 2011 because the request was for statistics broadly, but those estimates are model-based and tied to the paper's methods and 2006 U.S. population denominator.
Sources
- NCBI ESummary. NC_000913.3: Escherichia coli str. K-12 substr. MG1655, complete genome
- NCBI Datasets. Genome reports for taxon 562
- UniProt. Proteome UP000000625: Escherichia coli strain K-12 MG1655
- EcoCyc. Summary of Escherichia coli K-12 substr. MG1655
- Blattner FR, Plunkett G 3rd, Bloch CA, et al. The complete genome sequence of Escherichia coli K-12
- Cell Biology by the Numbers. How big is an E. coli cell and what is its mass?
- Cell Biology by the Numbers. How long does it take cells to copy their genomes?
- Cell Biology by the Numbers. How many ribosomes are in a cell?
- Cell Biology by the Numbers. What is faster, transcription or translation?
- Cell Biology by the Numbers. How many proteins are in a cell?
- Cell Biology by the Numbers. What is the mutation rate during genome replication?
- CDC. About Escherichia coli Infection
- CDC. Kinds of E. coli
- CDC. E. coli Surveillance
- Scallan E, Hoekstra RM, Angulo FJ, et al. Foodborne illness acquired in the United States: major pathogens
- Scallan et al. Table 2: Estimated annual number of foodborne illnesses
- Scallan et al. Table 3: Estimated annual number of hospitalizations and deaths





