Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Feb 20;9(1):e02096-17.
doi: 10.1128/mBio.02096-17.

The Essential Genome of Escherichia coli K-12

Affiliations

The Essential Genome of Escherichia coli K-12

Emily C A Goodall et al. mBio. .

Abstract

Transposon-directed insertion site sequencing (TraDIS) is a high-throughput method coupling transposon mutagenesis with short-fragment DNA sequencing. It is commonly used to identify essential genes. Single gene deletion libraries are considered the gold standard for identifying essential genes. Currently, the TraDIS method has not been benchmarked against such libraries, and therefore, it remains unclear whether the two methodologies are comparable. To address this, a high-density transposon library was constructed in Escherichia coli K-12. Essential genes predicted from sequencing of this library were compared to existing essential gene databases. To decrease false-positive identification of essential genes, statistical data analysis included corrections for both gene length and genome length. Through this analysis, new essential genes and genes previously incorrectly designated essential were identified. We show that manual analysis of TraDIS data reveals novel features that would not have been detected by statistical analysis alone. Examples include short essential regions within genes, orientation-dependent effects, and fine-resolution identification of genome and protein features. Recognition of these insertion profiles in transposon mutagenesis data sets will assist genome annotation of less well characterized genomes and provides new insights into bacterial physiology and biochemistry.IMPORTANCE Incentives to define lists of genes that are essential for bacterial survival include the identification of potential targets for antibacterial drug development, genes required for rapid growth for exploitation in biotechnology, and discovery of new biochemical pathways. To identify essential genes in Escherichia coli, we constructed a transposon mutant library of unprecedented density. Initial automated analysis of the resulting data revealed many discrepancies compared to the literature. We now report more extensive statistical analysis supported by both literature searches and detailed inspection of high-density TraDIS sequencing data for each putative essential gene for the E. coli model laboratory organism. This paper is important because it provides a better understanding of the essential genes of E. coli, reveals the limitations of relying on automated analysis alone, and provides a new standard for the analysis of TraDIS data.

Keywords: Escherichia coli; TraDIS; genomics; tn-seq.

PubMed Disclaimer

Figures

FIG 1
FIG 1
Genome-wide transposon insertion sites mapped to E. coli strain BW25113. (A) Frequency and location of transposon junction sequences from a mini-Tn5 transposon library in strain BW25113, mapped to the BW25113 genome (CP009273.1). The outermost track marks the BW25113 genome in base pairs starting at the annotation origin. The next two inner tracks correspond to sense and antisense CDS, respectively (gray), followed by two inner tracks depicting the essential genes identified by TraDIS on the sense and anti-sense strands, respectively (green). The innermost circle (blue) corresponds to the frequency and location of transposon insertion sequences mapped successfully to the BW25113 genome after identification of a transposon sequence. This figure was created using DNAPlotter. (B and C) Correlation coefficients of gene insertion index scores for two sequenced technical replicates of the input transposon library (TL1 and TL2) (B) and following growth in LB (LB1 and LB2) (C). (D) Representation of transposon insertion points across a portion of the E. coli K-12 BW25113 genome (blue), showing essential genes (green) and nonessential genes (gray). Blue bars correspond with transposon insertion sites along the genome and have been capped at a frequency of 1.
FIG 2
FIG 2
Comparison of essential gene data from various sources and examples of insertion profiles overlooked by automated statistical analysis of insertion index scores. (A) Putative essential genes identified using TraDIS were compared to existing essential gene data. A three-way comparison between the Keio collection of single gene knockouts, the online Profiling of the E. coli Chromosome (PEC) database, and our transposon insertion sequencing data identified 248 essential genes that were common to all three data sets. (B) The outlying genes of the Venn diagram, excluding those unique to our TraDIS data set, were inspected to understand the source of discrepancy between data sets. Genes were grouped into the overarching categories of “genes containing a transposon-free region,” “antitoxin,” “polar insertions,” “conditionally essential genes,” and “errors in library construction.” Genes not included in our analysis or that remain unclear are shown in red or gray, respectively.
FIG 3
FIG 3
Insertion profiles of discrepant genes between data sets. (A) The ftsK gene codes for an essential protein in which only part of the protein is required for its essential function. Such genes have a high insertion index score and consequently would not have been identified by automated statistical analysis. (B) secM contains a window (indicated by an asterisk) of 66 bp in which there were no transposon insertions. This feature is discussed in the text. (C to F) Genes with transposon insertions in only one orientation. The α- and β-orientation of the transposon is depicted above and below the schematic representation of the CDS, respectively, and native promoters are shown in black. (G and H) Many transposon insertions were found along the full length of folK and degS (shown in blue above the schematic map of the figure). However, most of these insertion mutants were lost during outgrowth (below the schematic representation and shown in dark blue).
FIG 4
FIG 4
Transcription and translation initiation from within the transposon. The full-length mini-Tn5 transposon was cloned into expression vectors pRW224 and pRW225 upstream of the lacZ gene, in each orientation, for all three open reading frames (ORFs). Vector pRW224 retains a ribosome binding site (RBS) for lacZ but no promoter, while vector pRW225 has no promoter or RBS upstream of lacZ. Vectors pRW224 and pRW225 can be used to detect transcriptional and translational activity, respectively. β-Galactosidase activity was measured in triplicate for three technical replicates. Values are mean values plus standard deviations between replicates (error bars). (A) Transcriptional read-through was confirmed for one orientation of the transposon, consistent with the orientation of the chloramphenicol gene. Translational read-through from the mini-Tn5 transposon was confirmed for two out of three open reading frames, consistent with GUG (ORF 1) and AUG (ORF 2) start codons in the transposon inverted repeat. (B) No transcriptional or translational read-through was detected for the opposite orientation of the transposon. (C) Schematic representing the orientation of transposon insertions. The α-orientation of the transposon (top expanded view) corresponds with the chloramphenicol cassette oriented left to right. The β-orientation (bottom expanded view) corresponds with transposon insertions in the opposite direction. An arbitrary gene is represented by the green arrow. The chloramphenicol cassette is denoted by the letters Tn.
FIG 5
FIG 5
Essential genes unique to our data. (A to E) There are very few or no insertions within these genes in our input library (blue). (A and B) Low insertion frequency and literature support classification of these genes as essential. (C to E) Recently annotated genes with few or no insertions. Our data suggest that these genes are potentially essential or important for growth. (F) The guaA gene has a sufficiently low insertion index score to be classified as essential after initial statistical analysis (shown in blue above the schematic representation). Following outgrowth, there are few guaA mutants (shown in dark blue below the schematic representation), consistent with literature reports that guaA mutants have a growth defect.
FIG 6
FIG 6
Additional features identified through detailed analysis of high-resolution insertion data. (A) Insertions within the yejM CDS, but not along the full length, correspond with a nonessential periplasmic domain. The 5′ end of the CDS has no insertions and corresponds with five essential transmembrane (TM) domains of YejM. (B) Insertions within yrfF suggest a dispensable 5′ domain. (C) The grpE gene tolerates transposon insertions in the 5′ end of the CDS (blue), but only in the orientation that maintains expression of the downstream protein (lower track, β-orientation). The GrpE protein forms a dimer (green) which interacts with DnaK (yellow). Transposon insertions in specific regions of the protein do not disrupt GrpE interaction with DnaK (blue). An additional, single, insertion point in the center of the grpE CDS (indicated by an asterisk) maps back to a turn between two helices of the GrpE protein. The data reveal dispensable sections of the GrpE protein and boundaries in secondary structure. (D) Insertions immediately upstream of lptC have an insertion orientation bias. Only insertions that maintain expression of lptC are tolerated within kdsC (α-orientation). The lptC gene has three promoters (indicated by the numbers 1, 2, and 3 within parentheses above the black arrows); the insertion boundary indicate that promoter 2 is the essential promoter. (E) Pseudogene ybbD contains many more insertions after the first stop codon (red), suggesting that the truncated CDS may still be functional and essential. (F) The pseudogene ykiB is not annotated in the BW25113 genome (CP009273.1) and has a single insertion within the CDS.

Similar articles

Cited by

References

    1. Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H. 2006. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol 2:2006.0008. doi:10.1038/msb4100050. - DOI - PMC - PubMed
    1. Yamazaki Y, Niki H, Kato J. 2008. Profiling of Escherichia coli Chromosome database. Methods Mol Biol 416:385–389. doi:10.1007/978-1-59745-321-9_26. - DOI - PubMed
    1. Nguyen BD, Valdivia RH. 2012. Virulence determinants in the obligate intracellular pathogen Chlamydia trachomatis revealed by forward genetic approaches. Proc Natl Acad Sci U S A 109:1263–1268. doi:10.1073/pnas.1117884109. - DOI - PMC - PubMed
    1. Langridge GC, Phan MD, Turner DJ, Perkins TT, Parts L, Haase J, Charles I, Maskell DJ, Peters SE, Dougan G, Wain J, Parkhill J, Turner AK. 2009. Simultaneous assay of every Salmonella Typhi gene using one million transposon mutants. Genome Res 19:2308–2316. doi:10.1101/gr.097097.109. - DOI - PMC - PubMed
    1. van Opijnen T, Bodi KL, Camilli A. 2009. Tn-seq: high-throughput parallel sequencing for fitness and genetic interaction studies in microorganisms. Nat Methods 6:767–772. doi:10.1038/nmeth.1377. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources

-