Skip to main content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
mBio. 2017 Jul-Aug; 8(4): e01069-17.
Published online 2017 Aug 15. doi: 10.1128/mBio.01069-17
PMCID: PMC5559632
PMID: 28811342

Bacteriophages of Gordonia spp. Display a Spectrum of Diversity and Genetic Relationships

Welkin H. Pope, Travis N. Mavrich, Rebecca A. Garlena, Carlos A. Guerrero-Bustamante, Deborah Jacobs-Sera, Matthew T. Montgomery, Daniel A. Russell, Marcie H. Warner, Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science (SEA-PHAGES), and Graham F. Hatfullcorresponding author
Richard Losick, Editor
Richard Losick, Harvard University;

ABSTRACT

The global bacteriophage population is large, dynamic, old, and highly diverse genetically. Many phages are tailed and contain double-stranded DNA, but these remain poorly characterized genomically. A collection of over 1,000 phages infecting Mycobacterium smegmatis reveals the diversity of phages of a common bacterial host, but their relationships to phages of phylogenetically proximal hosts are not known. Comparative sequence analysis of 79 phages isolated on Gordonia shows these also to be diverse and that the phages can be grouped into 14 clusters of related genomes, with an additional 14 phages that are “singletons” with no closely related genomes. One group of six phages is closely related to Cluster A mycobacteriophages, but the other Gordonia phages are distant relatives and share only 10% of their genes with the mycobacteriophages. The Gordonia phage genomes vary in genome length (17.1 to 103.4 kb), percentage of GC content (47 to 68.8%), and genome architecture and contain a variety of features not seen in other phage genomes. Like the mycobacteriophages, the highly mosaic Gordonia phages demonstrate a spectrum of genetic relationships. We show this is a general property of bacteriophages and suggest that any barriers to genetic exchange are soft and readily violable.

KEYWORDS: Gordonia, bacteriophage genetics, bacteriophages

IMPORTANCE

Despite the numerical dominance of bacteriophages in the biosphere, there is a dearth of complete genomic sequences. Current genomic information reveals that phages are highly diverse genomically and have mosaic architectures formed by extensive horizontal genetic exchange. Comparative analysis of 79 phages of Gordonia shows them to not only be highly diverse, but to present a spectrum of relatedness. Most are distantly related to phages of the phylogenetically proximal host Mycobacterium smegmatis, although one group of Gordonia phages is more closely related to mycobacteriophages than to the other Gordonia phages. Phage genome sequence space remains largely unexplored, but further isolation and genomic comparison of phages targeted at related groups of hosts promise to reveal pathways of bacteriophage evolution.

INTRODUCTION

Bacteriophages are the most abundant biological entities in the biosphere, with an estimated 1031 total particles (1). Investigations of viral sequence space and capsid structures suggest that the viral population is vast, dynamic, and old, containing large unexplored reservoirs of genetic information (2). The genetic texture of the phage population is ill defined, but the diversity is enormous, and the genomes are characteristically mosaic, with horizontal genetic transfer (HGT) mediated by illegitimate recombination playing an important role in their evolution (2, 3).

Tailed phages with double-stranded DNA (dsDNA) genomes predominate in the environment, and there are currently approximately 2,700 sequenced genomes in the GenBank nonredundant (nr) database. Phages of phylogenetically distal hosts typically share little or no DNA sequence similarity, and few if any gene products share amino acid sequence similarity (3). Patterns of diversity and evolutionary mechanisms can be explored by comparing phage genomes of closely related bacteria, and moderate-to-substantial (>20) collections of phage genome sequences have been determined for hosts such as mycobacteria (4), enterobacteria (5), Staphylococcus (6), Pseudomonas (7), Bacillus, Arthrobacter (61), and cyanobacteria (8). A large collection of phages known to infect a single common host, Mycobacterium smegmatis, shows high diversity, and they can be grouped in clusters based on overall nucleotide sequence similarity and shared gene content (9). Typically, two phages sharing nucleotide sequence similarity and gene content are placed in the same cluster, and phages in different clusters do not share extensive nucleotide similarity. Many of the clusters can be divided into subclusters based on comparisons of average nucleotide identity (ANI). Currently, there are over 1,360 sequenced mycobacteriophages grouped into 26 clusters (Clusters A to Z) and six singletons (those with no close relatives); 12 of the clusters are divided into subclusters. The number of phages in each cluster varies enormously (493 in Cluster A and 2 each in Clusters U, V, Y, and Z), and the large Cluster A is divided into 16 subclusters, illustrating the high diversity of this group of related phages. A detailed comparison of 627 of these genomes showed that phages in different clusters/singletons sometime share substantial numbers of genes—conferring a spectrum of diversity—as expected from the mosaic nature of the genomes. This continuum of diversity may be a common feature of phage genomes, although there are reported to be distinct lineages of cyanobacterial phages (8, 10) that are relatively isolated from genetic exchange with other phages. Mycobacteriophage subclusters, rather than clusters, and their counterparts in the enterobacteriophages (5) could be considered similar lineages, typically sharing over 90% of their genes.

The development of different genomic types is proposed to involve changes in host range that permit phages to migrate across the host landscape, such that different routes provide access to different parts of a large common gene pool (11). The mycobacteriophage Patience represents an example of a phage that appears to have entered the mycobacterial host neighborhood relatively recently, as it has a much lower percentage of GC content (50.3%) than M. smegmatis (67.3%); its overall codon usage profiles are also substantially different from those of its host (12). However, the codon usage profiles of highly expressed genes are more similar to those of M. smegmatis genes, suggesting that Patience is in the process of adapting to growth in this host (12). Deconstruction of the evolutionary histories of Patience and other mycobacteriophages is confounded by the lack of genomic characterization of phages of other bacterial hosts phylogenetically proximal to M. smegmatis (11).

To further explore these questions in phage diversity and evolution, we have comparatively analyzed 79 phages that infect Gordonia spp. (13,30). Gordonia and Mycobacterium are in the same phylum (Actinobacteria), order (Actinomycetales), and suborder (Corynebacterineae), and both contain mycolic acid-rich cell walls: Gordonia terrae 3612 and M. smegmatis mc2155 are closely related, with a 42% span-length match between complete genome sequences when using BLASTn. Members of Gordonia are aerobic heterotrophs, have been isolated from terrestrial and aquatic sources, including hydrocarbon-contaminated industrial sites, and have been implicated in opportunistic infections in immunocompromised individuals and foaming in wastewater treatment plants (31). The Gordonia phages are highly diverse and encompass a spectrum of genetic relatedness and mosaicism. However, only one subset of phages shows a close relationship to the mycobacteriophages, despite the phylogenetic proximity of their hosts and the similarities of the environments from which the phages were isolated (4, 32). This collection of Gordonia phages provides a resource for genetic analyses of Gordonia strains that evince considerable metabolic diversity (31) and include opportunistic pathogens (33).

RESULTS

Genometrics.

Sixty phages isolated on Gordonia spp. through enrichment culture or direct plating as part of the Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science (SEA-PHAGES) and the Phage Hunters Integrating Research and Education (PHIRE) programs (15, 18,28) were used for comparative analysis (Table 1), together with 19 Gordonia phages described previously (13, 16, 17). All but two of the SEA-PHAGES/PHIRE phages were isolated on G. terrae 3612; the other two were isolated on Gordonia neofelifaecis (34); the 19 previously described phages were isolated on Gordonia spp., including G. terrae, G. rubripertincta, G. malaquae, G. sputi, or G. alkanivorans (Table 1). These 79 phages span a wide range of genome lengths (17,118 to 103,424 bp) and percentages of GC content (47 to 68.8%), with many percentage of GC values distinct from those of their Gordonia hosts (G. terrae 3612, 67.8%). Most of the genomes have either defined ends with short (10 to 12 bases) 3′ single-stranded extensions or are circularly permuted and terminally redundant (Table 1). However, several have direct terminal repeats at their ends, varying in length from 191 bp to 1,182 bp (Table 1), an organization not seen in any of the sequenced mycobacteriophages. All of the 60 SEA-PHAGES and PHIRE phages were examined by electron microscopy, and all have icosahedral heads with flexible noncontractile tails and thus are morphologically grouped in the Siphoviridae; all of the 19 previously described Gordonia phages are also reported to be siphoviruses.

TABLE 1 

Gordonia phage profiles

Phage nameHost speciesStrainClusterLength (bp)GC %Genome endaAccession no.Reference
JSwagG. terrae3612A1552,72661.93′ 10-base ext.KX55728028
KatherineGG. terrae3612A1552,68961.93′ 10-base ext.KU99825128
RemusG. terrae3612A1552,73862.03′ 10-base ext.KX55728328
RosalindG. terrae3612A1552,68461.93′ 10-base ext.KU99825028
SoupsG. terrae3612A1552,92461.93′ 10-base ext.KU99824928
StrosahlG. terrae3612A1552,73862.03′ 10-base ext.KX55728428
BachitaG. terrae3612CQ193,84361.93′ 10-base ext.KU99824728
ClubLG. terrae3612CQ192,61861.93′ 10-base ext.KU99824628
CucurbitaG. terrae3612CQ193,68662.03′ 10-base ext.KU55727628
SmoothieG. terrae3612CQ193,13961.93′ 10-base ext.KU99824428
OneUpG. terrae3612CQ293,57761.53′ 10-base ext.KU99824528
GRU1G. rubripertinctaGrub38CR165,76665.5Cir. PermJF92379717
GTE5G. terraeGter34CR165,83965.1Cir. PermJF92379617
GTE8G. terraeGter34CR267,61766.0?KR05320113
GMA7G. malaquaeBEN700CS173,41956.6?KR06327813
GTE7G. terraeBen601CS174,43156.8?JN03561830
HotoroboG. terrae3612CS276,97258.9191-bp DTRKU96324526
MontyG. terrae3612CS275,68058.9191-bp DTRKU99824126
WoesG. terrae3612CS373,75259.1184-bp DTRKU99824026
Benczkowski14G. terrae3612CS475,38059.51,172-bp DTRKU96326222
DemosthenesG. terrae3612CS474,07359.31,182-bp DTRKU99824228
KatyushaG. terrae3612CS475,38059.51,172-bp DTRKU96325822
KvotheG. terrae3612CS475,46259.51,172-bp DTRKU99824328
CozzG. terrae3612CT46,60060.03′ 10-base ext.KU99823928
EmalynG. terrae3612CT43,98261.23′ 10-base ext.KU96326027
GTE2G. terraeGter34CT45,54060.3?HQ40364616
SplinterG. terrae3612CU145,85866.13′ 12-base ext.KU99823828
VendettaG. terrae3612CU145,85866.13′ 12-base ext.KU99823728
Gsput1G. sputiG11CU243,50562.85′ 16-base ext.KP79001114
BlueberryG. terrae3612CV54,99067.03′ 10-base ext.KU99823628
CaptainKirk2G. terrae3612CV47,89867.43′ 10-base ext.KX55727428
CarolAnnG. terrae3612CV54,16766.93′ 10-base ext.KX55727528
GuacamoleG. terrae3612CV49,89467.23′ 10-base ext.KU96325918
ObliviateG. terrae3612CV49,28667.53′ 10-base ext.KU96325418
UmaThurmanG. terrae3612CV50,12767.03′ 10-base ext.KU96325118
UtzG. terrae3612CV49,76867.73′ 10-base ext.KU99824828
JeanieG. neofelifaecisNRRL 59395Cw117,11868.63′ 10-base ext.KU99825628
McGonagallG. neofelifaecisNRRL 59395Cw117,11968.63′ 10-base ext.KU99825528
GMA5G. malaquaeBEN700Cw217,56266.4?KR05319813
GRU3G. rubripertinctaGrub38Cw217,72766.5?KR05319713
KampeG. terrae3612CX80,64947.0249-bp DTRKU99825428
OrchidG. terrae3612CX80,65047.0249-bp DTRKU99825328
PatrickStarG. terrae3612CX80,72947.0249-bp DTRKU99825228
BatStarrG. terrae3612CZ153,43266.63′ 10-base ext.KX55727328
KitaG. terrae3612CZ150,34666.73′ 10-base ext.KU96325720
NymphadoraG. terrae3612CZ153,43166.63′ 10-base ext.KU96325520
ZirinkaG. terrae3612CZ152,07766.73′ 11-base ext.KX55728728
AttisG. terrae3612CZ247,88166.83′ 11-base ext.KU96324724
SoilAssassinG. terrae3612CZ247,88066.83′ 11-base ext.KU96324624
BaxterFoxG. terrae3612CZ353,71766.53′ 10-base ext.KU96326320
YeezyG. terrae3612CZ351,88466.73′ 10-base ext.KU96324920
HoweG. terrae3612CZ453,18265.63′ 11-base ext.KU25258528
BowserG. terrae3612DB46,57067.13′ 10-base ext.KU99823515
SchwabeltierG. terrae3612DB46,89567.03′ 10-base ext.KU96325215
HedwigG. terrae3612DB44,53667.23′ 10-base ext.KX55727928
Twister6G. terrae3612DC57,80467.7Cir. PermKX55728628
WizardG. terrae3612DC58,30867.9Cir. PermKU99823428
GTE6G. terraeGter34DE56,98267.8?KR05320013
PhinallyG. terrae3612DE59,26568.4Cir. PermKU96325319
Vivi2G. terrae3612DE59,33767.1Cir. PermKU96325019
Gmala1Gordonia sp.G7DF175,16750.8Cir. PermKP79000914
GordDuk1Gordonia sp.G7DF176,27650.7Cir. PermKP79001014
GordTnk2Gordonia sp.G7DF175,98750.7Cir. PermKP79000814
GMA3G. malaquaeBEN700DF277,77951.3Cir. PermKR06327913
JumboG. terrae3612DF378,30254.5370-bp DTRKX55728128
TerapinG. terrae3612Singleton66,61159.6Cir. PermKX55728528
BantamG. terrae3612Singleton92,58064.73′ 10-base ext.KX55727228
BetterKatzG. terrae3612Singleton50,63667.13′ 10-base ext.KU96326123
BritBratG. terrae3612Singleton55,52465.03′ 10-base ext.KU99823328
EyreG. terrae3612Singleton44,92967.53′ 11-base ext.KX55727728
GhobesG. terrae3612Singleton45,28565.23′ 11-base ext.KX55727828
GMA1G. malaquaeA448Singleton41,20765.7?KR05319513, 29
GMA2G. malaquaeA448Singleton103,42453.4?KR06328113
GMA4G. malaquaeBEN700Singleton45,53766.4?KR05319913
GMA6G. malaquaeBEN700Singleton83,32458.2?KR06328013
GAL1G. alkanivoransDSMZ44369Singleton49,97963.5?KR05319413, 29
Lucky10G. terrae3612Singleton42,97965.43′ 10-base ext.KU96325625
NyceiraeG. terrae3612Singleton41,85767.53′ 9-base ext.KX55728228
YvonnetasticG. terrae3612Singleton98,13659.73′ 10-base ext.KU96324821
aext., extension (single stranded); Cir. Perm, circularly permuted; DTR, direct terminal repeat.

All 79 Gordonia phages together with all sequenced phages known to infect hosts in the phylum Actinobacteria and deposited in GenBank as of October 2016 were used to construct a database (Actinobacteriophage_789) in the program Phamerator (35). Phamerator uses the alignment-free clustering algorithm kclust (4) to group genes into “phamilies” (phams) with related amino acid sequences. The 77,955 genes of the 789 phages form 11,035 phams with an average of 7 genes per pham; 4,630 of the phams are “orphams”: that is, they contain only a single gene.

Grouping of Gordonia phages into clusters and subclusters.

The Gordonia phages span considerable genetic diversity, and comparisons with average nucleotide identities (ANIs) and dot plots reveal small groups of phages that are closely related to each other at the nucleotide level, but which are not closely related to other groups (Fig. 1A and andB).B). We initially assigned the Gordonia phages to clusters and subclusters using the same parameters as for the mycobacteriophages, forming 17 clusters and 19 singletons. However, this generates several instances in which phages of different clusters share large proportions of their genes and are likely to have common biological features. This is illustrated by the abundance of variable branch lengths in a SplitsTree network-based gene content phylogeny, creating examples where phages share a substantial portion of their gene content, without necessarily having nucleotide span-length matches exceeding 50% of the genomes (e.g., SoilAssassin and Yeezy [Fig. 1C]). We therefore chose to further refine the clustering parameters such that phages would be grouped into the same cluster if they share at least 35% of their genes (sorted into phamilies) with at least one other member of that cluster (Fig. 1 and Table 1). Thus, phages such as Attis and SoilAssassin that share 37 to 40% of their genes with Kita and BaxterFox are grouped together in Cluster CZ, whereas Orchid, Kampe, and PatrickStar (Cluster CX) share 28% of their genes with GMA3, Jumbo, GordTnk2, GordTuk1, and Gmala1 (Cluster DF), and clustered separately. These parameters place each of the phages in this analysis into one and only one cluster, however, it is conceivable that expansion of the collection with future isolates could yield a phage sufficiently similar to phages of two distinct clusters such as to warrant membership in both groups. Such an event would necessitate further refinement of our clustering parameters.

An external file that holds a picture, illustration, etc.
Object name is mbo0041734250001.jpg

Analyses of complete genome sequences of phages of Gordonia spp. (A) Heat map of average nucleotide identities (ANIs) of Gordonia phages. Pairwise ANIs were calculated using DNA Master, and the heat map was generated using R. (B) Dot plot of complete Gordonia phage sequences. (C) SplitsTree network of shared gene content between Gordonia phages. All genes were grouped into phams using Phamerator. Each phage was scored by the presence/absence of phams, and the distance between each phage was calculated using SplitsTree. The scale bar indicates 0.001 substitution. Phages are colored according to cluster membership.

Using these revised parameters, 65 of the phages can be grouped into a total of 14 clusters, and the other 14 have no close relatives and are designated “singletons” (Table 1). Thirteen of the clusters contain only Gordonia phages, but a group of six Gordonia phages form a subcluster (A15) within the large group of Cluster A phages that previously exclusively contained mycobacteriophages (Table 1). Seven of the Gordonia phage clusters (CQ, CR, CS, CU, CW, CZ, and DF) can be subdivided into subclusters (Table 1). By comparison, when 60 mycobacteriophages had been sequenced, they were grouped into nine clusters (five with subcluster divisions) and five singletons (9), and when 80 were sequenced, they were grouped into 10 clusters (seven with subcluster divisions) and five singletons (36). Thus, at comparable sample sizes, the Gordonia phages are at least as diverse as the mycobacteriophages (28 clusters plus singletons for Gordonia phages versus 15 clusters plus singletons for mycobacteriophages). Genome maps of these phages are included in Fig. S1 to S27 at figshare (https://doi.org/10.6084/m9.figshare.5149663).

A diverse array of systems for immunity and lysogenic maintenance in Gordonia phages.

The Gordonia phages form a variety of plaque types, ranging from clear to turbid, and the majority (53 of the 79) have genomic features associated with temperate phages, including tyrosine-integrase (Int-Y) or serine-integrase (Int-S), parABS partitioning systems, or repressors; only phages in Clusters CR, CS, CT, and DE and singletons Terapin, Ghobes, GMA2, and GMA6 lack these (Table 1). For 27 of the phages encoding a tyrosine-integrase (Int-Y), a putative attP common core sequence (20 to 40 bp) could be identified by similarity to a putative attB common core in the G. terrae 3612 genome (Table 2; see Table S1 at figshare). A total of 13 different attB sites were identified, 10 of which overlap host tRNA genes (Table 2). Interestingly, about half of these phages have the features of integration-dependent immunity systems described previously for mycobacteriophages (37), in which the attP site is located within the repressor gene (located immediately upstream), and the extreme C terminus of the repressor contains an ssrA-like degradation tag (Table S1). In some phages (e.g., Splinter and Vendetta) there are two potential attP sites corresponding to two different attB sites (Table 2; see Table S1 at figshare [https://doi.org/10.6084/m9.figshare.5149663]), but it is unclear if only one or both are used for integration. In GMA1, there is a single putative attP site, but it corresponds to two potential attB sites, suggesting that GMA1 can form double lysogens in which two prophages can be established in the same cell. We could not identify the attP site for about half of the Int-Y phages, but repressors of some phages (e.g., Nyceirae) have putative degradation tags and thus may also use integration-dependent immunity regulation. The attP sites for the serine-integrase (Int-S) phages could not be bioinformatically predicted as their attP and attB sites only have very short segments of sequence identity (38).

TABLE 2 

Predicted attB sites used by Gordonia phages

attB sitetRNALocus tagaCoordinatesPhage(s)
attB-1tRNAAlaBCM27_004511498–11475GMA4
attB-2tRNAArgBCM27_01925422935–422970UmaThurman, Twister6, Wizard
attB-3tRNASerBCM27_02280499312–499338Attis, SoilAssassin, Nymphadora, BatStarr, Kita
attB-4tRNAThrBCM27_03985854236–854279CaptainKirk2, Guacamole, Obliviate, Hedwig, Eyre
attB-5tRNALysBCM27_064601424071–1424110Splinter, Vendetta
attB-6tRNAAlaBCM27_076301682291–1682387BaxterFox, Yeezy
attB-7Intergenic07635-076401683467–1683547BaxterFox
attB-8tRNALysBCM27_103652294155–2294191McGonagall, Jeanie, GMA5, GRU3
attB-9tRNAArgBCM27_104352309527–2309565Splinter, Vendetta, Bantam
attB-10Intergenic17750–177553953264–3953239Howe, Lucky10
attB-11tRNAAlaBCM27_178053962309–3962284Howe, Lucky10
attB-12tRNAGlyBCM27_225005080774–5080728GMA1, GAL1
attB-13Intergenic22545–225505093356–5093312GMA1, GAL1
aLocus tags are for Gordonia terrae 3612 (GenBank accession no. CP016594). Flanking locus tags are shown for intergenic sites.

Several phages are unusual in having two integrase-like genes located near the centers of their genomes, although some are partial genes (e.g., Twister6 gene 47 and Wizard gene 44) and are likely to be defective (Fig. 2). For example, Wizard and Twister6 have seemingly intact integration-dependent immunity systems (37) but also have an Int-S-like gene that lacks the catalytic N-terminal domain (38) (Fig. 2). However, Utz, Howe, and Bowser all appear to have two intact integrase genes, all of which are Int-Y genes, except for Utz gene 33; Utz gene 31, Howe gene 36, and Bowser gene 37 are closely related and share nucleotide sequence similarity (Fig. 2). Curiously, the attP site identified in Howe is absent from Bowser and Utz, such that even though the Int-Y proteins are related (71% amino acid identity), the integration system may be nonfunctional for lack of an attP site. Bowser’s second Int-Y has features of an integration-dependent immunity component and contains a putative ssrA-like tag at its C terminus, as does the repressor gene (gene 40) (Fig. 2). However, we have not been able to identify a putative attP site either in the canonical locations within the repressor gene or elsewhere, and it is plausible that this integration system evolved to operate in a host other than G. terrae. We also note that the attP site associated with the Howe gene 36 Int-Y can potentially use two different attB sites (Fig. 2 and Table 2; see Table S1 at figshare). Finally, in Clusters CX and DF, the putative integrase genes are located in the middle of the right arm genes and lack the N-terminal domain associated with lambda-like Int-Y’s, and attP sites could not be identified. Although these systems are complex, it is not yet clear which of these recombinases mediate integration and which may be involved in recombination functions other than in phage integration (14). Taken together, the Gordonia phages show an unusual level of complexity of these site-specific recombinases that is not seen in phages of other actinobacterial hosts.

An external file that holds a picture, illustration, etc.
Object name is mbo0041734250002.jpg

Gordonia phage genomes exhibit multiple integrases. Phamerator map of the integration cassettes of the six phages with double integrases. Integrases are labeled with catalytic residue (Y or S), and attP sites are indicated when known. Wizard gene 44 and Twister6 gene 47 are predicted to be nonfunctional.

Potential for prophage-mediated defense systems.

We have previously described a series of prophage-mediated defense systems encoded by Cluster N mycobacteriophages that defend against heterotypic viral attack (39). The genes conferring defense are located adjacent to the integration/immunity cassette in a region that is highly variable within a set of otherwise closely related genomes (39). Several groups of Gordonia phages share this genomic architecture, notably in Clusters CV, CZ, and DB (see Fig. S7, S10, and S11 at figshare [https://doi.org/10.6084/m9.figshare.5149663]). The phages within Cluster CV have 4 to 10 genes in the region between the minor tail protein genes and the integration/immunity systems, and although the functions of most of these are unknown, there are several with putative functions similar to those in Cluster N phages. These include Blueberry gene 33, which possibly codes for a restriction protein, CaptainKirk2 genes 34 and 35, and Utz genes 34 and 35 coding for toxin-antitoxin (TA) pairs, and genes (e.g., CaptainKirk2 genes 32 and 37) coding for predicted membrane proteins. Likewise, in Cluster CZ phages, genes in this region code for candidate TA systems (e.g., Nymphadora genes 42 and 43 and Zirinka genes 33 and 34) and membrane proteins (e.g., Kita gene 33), and BaxterFox genes 31 and 32 are analogous to the MichelleMyBell viral defense genes 29 and 30. Similarly, in the Cluster DB phages, there are 6 to 8 genes in this location, including genes for putative antitoxins (e.g., Bowser gene 35 and Schwabeltier gene 33) and membrane proteins (e.g., Schwabeltier gene 35); Bowser gene 31 codes for a putative lipase. Several other genomes also code for TA pairs, including Bantam and Eyre (see Table S2 at figshare). Taken together, we predict that the temperate Gordonia phages are replete with novel viral defense systems that warrant further investigation.

Lysis systems in Gordonia phages.

Most mycobacteriophages code for three closely linked gene products involved in lysis—a holin, an endolysin (lysin A), and a mycolylarabinogalactan esterase (lysin B)—although some (e.g., Che12) lack the lysin B gene (40). Because the lysin B enzyme is responsible for cleavage of the mycobacterial outer membrane during lysis and because Gordonia cells contain mycolic acids, it is not surprising that most (all except for Clusters CR, CW, and DE and the singleton BetterKatz) of the Gordonia phages carry lysin B genes. However, there is great variation in where and how the lysis cassettes are organized. In many genomes (e.g., Cluster CR), the endolysin and holin genes are located downstream of the tail genes, a position where they are commonly located in the mycobacteriophages. However, in several phages (e.g., Clusters CV and CZ), they are found atypically within the minor tail protein genes. In a few genomes (e.g., Clusters CV, CZ, DB, DC, and DF), the lysin B genes are near the other lysis functions, but in most, they are displaced from it and are commonly positioned with the right arm genes. Finally, in some genomes (e.g., Cluster DF) the multiple domains found in mycobacteriophage endolysins (41) appear to be split into different genes.

Atypically small Gordonia phage genomes.

The Cluster CW phages have genomes that are much smaller (17.1 to 17.7 kbp) than the smallest of the mycobacteriophage genomes (41 kbp), although phages with similarly sized genomes have been found for other actinobacterial hosts, including Rhodococcus (e.g., RRH1, 14.3 kbp) (42) and Arthrobacter (e.g., Maggie, 15.5 kbp) (61). All of these have siphoviral morphologies, and more than 80% of the genome coding capacity is devoted to the virion structure and assembly genes (Fig. 3). Although the small-genome phages of Arthrobacter, Rhodococcus, and Gordonia are not closely related, some of the gene products have amino acid sequence similarity, including all of the large terminase subunits (31 to 68% amino acid identity [Fig. 3]). Although RRH1 and Maggie are seemingly lytic, the Gordonia Cluster CW phages are temperate and form turbid plaques, and these carry integrase-dependent immunity systems as noted above. Although the repressors of GMA5 and GRU3 have diverged from those of Jeanie and McGonagall, all share an identical 37-bp attP site corresponding to an attB site overlapping a tRNALys gene (BMC27_10365) in G. terrae 3612 (see Fig. S8 at figshare [https://doi.org/10.6084/m9.figshare.5149663]). GMA5 deviates somewhat from the canonical integration-dependent immunity system organizations in that an additional open reading frame (gene 18) of no known function is inserted between the integrase and repressor genes (Fig. 3); a homologue is present in Maggie (gene 20), although it lacks an immunity system (Fig. 3).

An external file that holds a picture, illustration, etc.
Object name is mbo0041734250003.jpg

Small-genome actinobacteriophages. Shown are genomic maps of the small-genome phages included in the Actinobacteriophage_789 database. The central rule indicates nucleotide position—with large tick marks every 1 kbp—and the colored boxes indicate predicted genes. Genes are colored according to pham membership, with the number on the top of each box representing the pham number, followed by the number of members of the pham in the database in parentheses. Pairwise DNA sequence similarity calculated by BLASTn is shown between adjacent genomes and is spectrum colored, with violet being the most similar and red the least above a threshold E value of 10−5. Phages infect the following hosts: RRH1, Rhodococcus sp.; Jeanie, G. neofelifaecis; Maggie, Arthrobacter; McGonagall, G. neofelifaecis; GMA5, G. malaquae; and GRU1, G. rubripertincta.

Virion morphology variation in Cluster CZ.

Gordonia phage Cluster CZ is unusual in that it contains phages with distinctly different virion morphologies. All nine have long flexible noncontractile tails, but five (Howe, BaxterFox, Yeezy, Attis, and SoilAssassin) have isometric heads, whereas the other four (BatStarr, Kita, Nymphadora, and Zirinka) have prolate heads (Fig. 4). Genome comparisons show that the two sets of genomes differ in the leftmost eight genes that code for the terminase, portal, capsid maturation protease, and capsid proteins. Interestingly, the portal, protease, and capsid proteins of the prolate CZ phages have sequence similarity (65 to 67% amino acid identity) to homologues in the Cluster I mycobacteriophages, such as Che9c, which also have prolate capsids. This example illustrates a genetic link between the Gordonia and Mycobacterium phages and how the exchange of genes can result in morphological variation among phages with otherwise similar overall genomic architectures.

An external file that holds a picture, illustration, etc.
Object name is mbo0041734250004.jpg

Gordonia phages in Clusters CZ exhibit multiple virion morphologies. (Top) Genome maps of Gordonia phages Kita and Yeezy and mycobacteriophage Che9c (see Fig. 3 for details) displayed in two tiers. Yeezy and Kita are both Gordonia phages but have different capsid morphologies: Yeezy is isometric, and Kita is prolate. Che9c is a mycobacteriophage that shares capsid structure and assembly genes with Kita and is also prolate. (Bottom) Electron micrographs of Kita, Yeezy, and Che9c. Scale bar = 100 nm.

Relationships of Gordonia phages to phages of other actinobacterial hosts.

It is striking that six of the Gordonia phages (Table 1) fall into Cluster A—which previously contained solely mycobacteriophages—and form a distinct subcluster (A15) (Fig. 5; see Fig. S1 and Table S3 at figshare [https://doi.org/10.6084/m9.figshare.5149663]). The Subcluster A15 Gordonia phages are closely related to each other, and phage KatherineG shares the greatest sequence similarity with Mycobacterium phage Che12, Subcluster A2 (BLASTn score of 77% nucleotide identity spanning 47% of their genome lengths; they share 64% of their genes) (Fig. 5A and andB).B). However, the Gordonia Subcluster A15 phages are distant relatives of all other Gordonia phages, and only about 15% of their genes are present in other Gordonia phages. The Gordonia Cluster A phage homologues are distributed broadly among the other Gordonia phages, including representatives of Clusters CQ, CS, CT, CU, CW, CZ, DB, DC, DE, and DF and singletons BritBrat, GMA4, GMA6, and Yvonnetastic.

An external file that holds a picture, illustration, etc.
Object name is mbo0041734250005.jpg

Relationships between Gordonia and Mycobacterium phages. (A) Genome maps of Gordonia phage KatherineG and mycobacteriophages Phlei and Che12 (see Fig. 3 for details). Stoperators are indicated with a + or − above or below the map to indicate sequence orientation. (B) SplitsTree (60) network representation of shared gene content among the Cluster A phages. Phage groups are colored according to subcluster. The positions of Che12 (A2), Phlei (A13), and KatherineG (A15) are shown. (C) Shared gene phamilies of phages of Mycobacterium, Gordonia, and Arthrobacter. Phamily membership was determined using Phamerator, and the proportions of shared phams were calculated: 24 phams out of 8,294 total are shared between phages of all three hosts, 395 phams are shared between Gordonia and Mycobacterium, 57 phams are shared between Gordonia and Arthrobacter, and 56 phams are shared between Arthrobacter and Mycobacterium. The percentages shown are determined relative to the total number of phams present in each host (the number in boldface in parentheses in each circle).

The relationship of Gordonia Subcluster A15 phages to the mycobacteriophages is distinct and different from those of the other Gordonia phages (Fig. 5C). Of the 2,848 gene phamilies in the Gordonia phages, only 13.9% are shared with the Mycobacterium phages; the proportion of shared phamilies is only 10% if the A15 phages are excluded. Although few genes are shared, this is still greater than the number of phamilies shared with phages of the more distantly related host Arthrobacter (1.9% [Fig. 5C]), although this likely also reflects the differences in the total numbers of phages and phamilies that have been identified. As additional phages of different hosts within this phylogenetic space are characterized, further genetic connections between the phages are anticipated.

Cluster A is the largest group of related actinobacteriophages (192 in the database used here) and contains 16 subclusters (see phagesdb.org for a complete list of all current actinobacteriophage clusters and cluster membership). They are temperate (or recent lytic derivatives of temperate parents) and share overall genome architectures. They also have a common but unusual immunity regulation scheme in which the repressor binds to multiple binding sites (operators and “stoperators” [43]) located intergenically throughout the genomes. The Cluster A genomes all contain either an integration system using a tyrosine-integrase (44) or serine-integrase (45), although some from Subcluster A6 encode parABS partitioning systems (46, 47). The Subcluster A15 Gordonia phages are closely related to each other, all contain a parABS partitioning system, and the repressors are >99% identical to each other, indicating these likely form a homoimmune group. The A15 phages each contain 21 or 22 stoperator sites corresponding to the 13-bp asymmetric consensus sequence 5′-GGGGATTGTCAAG. The sequence conservation among the sites is similar to that reported previously (36, 43, 48), with positions 1, 9, 10, 11, 12, and 13 being invariant and positions 2 to 8 correlating with differences in immune specificity.

Gordonia phages display a spectrum of relatedness.

We noted above that the parameters used for cluster assignments were revised for grouping of the Gordonia phages, reflecting the lack of distinct boundaries between genome types and a spectrum of diversity as described for the mycobacteriophages (4). To explore the relationships between the Gordonia phages and other actinobacteriophages further, we calculated gene content dissimilarity (GCD) for pairwise genome comparisons between all 79 phages, where GCD is the proportion of shared phams relative to each genome’s total number of phams, averaging the two proportions and subtracting from 1 (see Materials and Methods) (49). Values range from 1, where no genes are shared (i.e., the two phages are 100% dissimilar) to 0, where the gene content is identical. The diversity of the Gordonia phages is illustrated by the range and distribution of GCD values (Fig. 6). A high proportion (88.6%) of pairwise comparisons have fewer than 10% shared genes (GCD, >0.9), with all 79 phages involved in at least one of the comparisons (Fig. 6). However, about 3% of the pairwise comparisons have values between 0.3 and 0.7 (i.e., reflecting 70% and 30% shared genes, respectively), and 38 of the 79 phages (47%) participate in at least one of these comparisons, representing 10 of the 28 groups (clusters plus singletons; 35.7%). Thus, although a value of 35% pairwise shared genes (GCD, 0.65) was used to group genomes into clusters, this is an arbitrarily chosen cutoff and not one that reflects a fundamental separation point among genome comparisons. It is helpful to note the cluster assignment parameters used here require 35% shared gene content with only one other phage, but because clusters (and subclusters) themselves can be quite diverse, the GCD values vary enormously within both clusters and subclusters.

An external file that holds a picture, illustration, etc.
Object name is mbo0041734250006.jpg

Gene content dissimilarity in phage populations. For all plots, the x axis is the ordered-by-magnitude individual pairwise comparisons, the y axis is the gene content dissimilarity (GCD [where 1 and 0 correspond to no shared genes and 100% shared genes, respectively]). (A) GCD in mycobacteriophages. (B) GCD in Gordonia phages. (C) GCD in Arthrobacter phages. (D) GCD in cyanophages. Both Mycobacterium and Gordonia phages exhibit a smooth curve of pairwise GCD values when ordered by magnitude. Cyanobacteriophages also exhibit a continuum of diversity with respect to GCD values; however, the slope of the line exhibits a number of plateaus, likely reflecting the “discrete” lineages described previously. Arthrobacter phages exhibit a large discontinuity, reflective of the few shared genes between clusters. Brackets indicate the interval between 30% and 70% gene content dissimilarity.

Bacteriophages of Arthrobacter provide an informative comparison (Fig. 6). Pairwise comparison of 47 phages of Arthrobacter (61) shows greater delineation between groups that are closely related and those that are unrelated than for the Gordonia phages, (Fig. 6). For example, only 2 of the 1,129 (0.12%) pairwise genome comparisons have GCD values in the 30 to 70% range, contributed by just three (0.5%) of the phages, representing 2 of the 13 groups (16.7%) (11 clusters, two singletons). For the 550 mycobacteriophages in the data set used here, 7% of the pairwise genome comparisons have a GCD in the 30 to 70% range, and 80% of the mycobacteriophages—13 of the 30 groups (43%)—participate in at least one comparison; similar proportions were identified when we repeated the analysis using three random samplings of 80 mycobacteriophages. The parsimonious explanation is that the mycobacteriophages and Gordonia phages are exchanging genes at a higher rate with other phages of the same host than the Arthrobacter phages are. We similarly examined a group of cyanophages as these were reported elsewhere to form discrete lineages (8, 10). These have a more complex GCD pattern (Fig. 6), with 27% of the comparisons within the 30 to 70% bracket (Fig. 6), and 188 phages of 209 cyanophages participate in these comparisons.

Phage genome relationships measured by MaxGCDGap.

To explore further the complex relationships between the Gordonia phages—and bacteriophage genomes in general—we developed an additional metric, the maximum gap in GCD values (MaxGCDGap). This is calculated by computing the pairwise GCD values for each phage against all other phages within a data set, ranking them by magnitude from largest to smallest, and calculating the difference between adjacent values (GCD gap) (see Materials and Methods) (Fig. 7 and Fig. 8A); the MaxGCDGap is the largest of these values. In contrast to other metrics of phage diversity used previously (4), the MaxGCDGap can be calculated for each phage within a data set without prior assignment to clusters, lineages, or other taxonomic groups and reflects a discontinuity in the relationships between one phage and all other phages (Fig. 7). It is important to note that the MaxGCDGap values are not absolute and depend on the size and composition of the data set analyzed. Moreover, the MaxGCDGap value is the difference between two GCD measures and is dependent on multiple parameters, including the variation within groups of related phages, the genetic relatedness of different groups of phages, and the extent to which the sampling reflects the larger populations. Nonetheless, groups of phages that are well separated in genetic sequence space are expected on average to have higher MaxGCDGap values than those in which there is little or only weak distinction between the groups (Fig. 7). Two examples are shown in Fig. 8A.

An external file that holds a picture, illustration, etc.
Object name is mbo0041734250007.jpg

The various types of relationships of phage genomes are illustrated by three panels, varying (from left to right) from well-separated groups of related phages to a near continuum of relationships with weakly distinct groups. Phages sharing portions of their genomes can be grouped into clusters (shown in similar colors and surrounded by thick dashed circles), some of which can be subdivided into subclusters (surrounded by thin dashed circles). Typically, phages within a cluster have low pairwise gene content dissimilarity (GCD) values (i.e., they share a high proportion of their genes) and phages in different clusters have high pairwise GCD values. The relationships can be represented by MaxGCDGap values that correspond to discontinuities in the range of relationships of one phage relative to all others. Arrows indicate pairs of genomes in relationships at the high value of the MaxGCDGap parameter, which may occur between different clusters (and singletons) or between subclusters within a cluster. The MaxGCDGap values vary on the overall diversity within and between clusters, but are generally higher within phage populations with genetically well-separated phages (e.g., left panel) than where there is a near continuum of genetic diversity (e.g., right panel).

An external file that holds a picture, illustration, etc.
Object name is mbo0041734250008.jpg

Phage genetic relationships as measured by MaxGCDGap. (A) Representative pairwise GCDs between Gordonia phage Monty (left) or GMA2 (right) and all other phages in the Actinobacteriophage_789 database, ordered by magnitude, similar to Fig. 6 (see Fig. S28 at figshare [https://doi.org/10.6084/m9.figshare.5149663]). Phages involved in comparisons with a GCD of <0.8 are highlighted. The maximum gap in GCD values (MaxGCDGap) was identified for each phage. For example, Cluster CS Monty’s MaxGCDGap occurs between Gordonia phage Kvothe (Cluster CS) and the singleton (sin) Rhodococcus phage ReqiDocB7. It should be noted that the second largest GCD gap is between phages in other CS subclusters (Hotorobo and Woes), illustrating that for some phages, the MaxGCDGap may reflect the distance between subclusters rather than clusters. Singleton phage GMA2 has no close relatives, and the MaxGCDGap is large and approaches 1.0. (B) All phage-specific MaxGCDGap values ordered by magnitude, with the mean and median indicated. Each data point represents a single phage genome. (C) All Gordonia phage-specific MaxGCDGaps grouped by cluster and ordered by median. Each data point represents a single phage genome. (D and E) Box plot distribution of actinobacteriophage-specific MaxGCDGaps from panel B grouped by subcluster (D) or cluster (E) and ordered by median. Boxes reflect the central 50% of the data, with the median as a black bar, and the individual MaxGCDGap values are superimposed. Only the most abundant groups are plotted. (F) Box plot distribution of phage-specific MaxGCDGaps as in panels D and E but grouped by host genus and with mean MaxGCDGaps displayed above. Only the most abundant genera are plotted. (G) Box plot distribution of Synechococcus phage-specific MaxGCDGaps as in panels D to F, grouped by the six previously identified lineages (Lin.) (8). The topmost bar chart in each panel indicates the number of phages per group.

When all phages are rank ordered according to their MaxGCDGap values, an uninterrupted spectrum is observed around a MaxGCDGap mean of 0.45 (Fig. 8B), with values ranging from 0.076 to 1.0 (Fig. 8B). There is thus heterogeneity in the relationships, which are not constrained to any one proposed scenario in Fig. 7. When the Gordonia phages are examined and sorted by cluster (Fig. 8C) a broad range of values are observed, with the singletons having the highest values (most diverse from all other phages) and other phages ranging from MaxGCDGap values of 0.2 to 0.7 (Fig. 8C). A similar broad range of values is also observed with actinobacteriophages of other clusters and subclusters (Fig. 8D and andE)E) and is reflected further when grouped by bacterial hosts (Fig. 8F). We note that phages of Propionibacterium (50) and Arthrobacter (61) have relatively high MaxGCDGap values and are examples of well-separated phage groups (Fig. 7), but in general, broad ranges of both mean values and distributions of values are observed for all other groups examined, reflecting the spectrum of values in the entire phage genome set (Fig. 8B). We note that phages of Synechococcus have both a relatively low mean MaxGCDGap value and a broad distribution, not substantially different from those of the mycobacteriophages. Overall, the observed patterns are consistent with a view in which the mosaic architecture of phage genomes results from HGT from a continuum of phage genetic space, albeit with substantially unequal sampling and undersampling of the individual phages from the population at large.

DISCUSSION

Here we have described a collection of phages of Gordonia that expand our view of the overall diversity of the phage population and the relationships between phages of different hosts. The 79 phages encompass an amazingly large number of different genome types, but the relationships between them are complex. The genome architectures reveal many unexpected features, including atypical organizations of lysis genes, integration cassettes, and the possibility of many additional systems implicated in prophage-mediated viral defense.

We note that the grouping of Gordonia phages into clusters and subclusters is primarily a taxonomy of convenience (4, 5), recognizing the heterogeneity of the relationships and simplifying the discussion of phages that share many of their biological features. However, multiple lines of evidence (e.g., Fig. 1, ,4,4, ,6,6, and and8)8) show that many of the intracluster or intrasubcluster boundaries are ill-defined, because one or more phages within a cluster can share a substantial portion (up to ∼35%) of their genes with phages of other clusters. The cutoff values for grouping are thus largely arbitrary, but they are nevertheless intended to identify genomic relationships within a certain evolutionary scope to accommodate mosaicism. Thus, the groupings do not reflect definitive taxonomic or evolutionary boundaries between individual phages, and other cutoff values would expand or constrict group membership to highlight more distant or close genomic relationships, respectively. We predict strongly that isolation of additional Gordonia phages will further smooth their genetic landscape. This not only reflects the relationships illustrated by a large group of mycobacteriophages (4) but appears to be a general property of bacteriophage diversity and evolution (Fig. 8).

The complexities among the Gordonia phages warranted a revision of the clustering parameters used for mycobacteriophages (4, 9), and clustering of additional phages—especially of other host species—would benefit from a two-step process. First, genomes can be grouped using previously established parameters based on overall nucleotide similarity, which will suffice for some phage groups (e.g., Arthrobacter phages). However, if there the boundaries between genome types are less defined, the grouping can be further refined based on shared gene content, with any two phages sharing more than 35% of their genes being placed in the same cluster. We note that such parameters would impose few changes from the current clustering of mycobacteriophages, with the exception of Clusters I, P, and N (sharing 40 to 50% of their genes [4]), which would be grouped together. To avoid confusion, we do not propose a revision of the current mycobacteriophage clusters. We note again that the parameters for clustering are basically arbitrary and thus subject to further change with deeper sampling across the spectrum of diversity.

It is curious that one of the phage groups (Subcluster A15) is closely related to the mycobacteriophages, whereas the other Gordonia phages are distantly related to them and share few of their genes. This both indicates the value of characterizing phages of phylogenetically proximal hosts to identify genetic connections and the necessity to greatly expand the sizes of the phage collection on currently used hosts as well as additional hosts. The collection of >1,300 sequenced mycobacteriophages still reflects a considerable undersampling of the populations of the viruses at large, and we predict that it will be informative to continue isolating and sequencing Gordonia phage genomes until at least 1,000 have been characterized. Similar studies with phages of other hosts in the suborder Corynebacterineae targeting similar-sized collections will likely contribute greater numbers of genetic connections between the phages.

It was reported previously that Synechococcus phages form discrete lineages but participate in widespread HGT (8, 10). At least six lineages were defined by phylogenetic reconstruction of over 50 widely shared core genes common to the T4-like myoviruses (8). However, the MaxGCDGap values for five of these lineages are low (∼0.2 [Fig. 8G]), reflecting a low level of discontinuity across of the spectrum of possible genetic relationships and consistent with active participation in HGT (Fig. 7G). The degree of the discreteness of populations thus depends on the evolutionary time scale being considered. Over relatively short time frames, there clearly are distinct groups of phages (lineages of Synechococcus phages, equivalent to subclusters for the actinobacteriophages) that indicate either constraints on HGT or reflect the relative rates of HGT compared to the mutational clocks. However, for longer time frames, there is clearly extensive HGT among most of all phage genomes, albeit to various degrees depending on the hosts and the genome types.

The Gordonia phages present numerous opportunities for the development of genetic tools for Gordonia genetics. In particular, the broad range and types of integration systems will facilitate the development of integration-dependent vectors (44) for use in constructing recombinant Gordonia strains; the 13 predicted attB sites (Table 2) suggest the possibility of using multiple compatible vectors. The Subcluster A15 parABS partitioning systems may be used for stabilizing extrachromosomal plasmid vectors as they have been shown to do for the mycobacteria (46). There are also multiple examples of RecET-like systems in the Gordonia phages, and these have the potential to be used for Gordonia recombineering systems. Many of the phages are within a genome length and style that suggest they would be suitable for the construction of shuttle phasmids (51), which in turn could be exploited for the delivery of reporter genes and transposons (52). As noted previously (13), the lytic Gordonia phages or lytic derivatives of the temperate phages could be useful for controlling Gordonia growth in environment or biomedical applications.

Finally, many of the Gordonia phages are temperate and exhibit the genomic hallmarks associated with prophage-mediated viral defense systems (39). These could be explored by measuring plating efficiencies of Gordonia phages on different lysogenic strains, and we predict that the viral defense genes will be expressed in lysogeny. Given the genomic diversity, we anticipate that this will reveal many new systems that prophages use to defend against heterotypic phage attack.

MATERIALS AND METHODS

Sixty phages were isolated on Gordonia spp. through enrichment culture or direct plating as part of the SEA-PHAGES and PHIRE programs, and dsDNA was extracted (15, 18,28). Double-stranded DNA genomes were sequenced using Illumina Mi-Seq and assembled using Newbler and Consed. The genomes were annotated using DNA Master (cobamide2.bio.pitt.edu), GeneMark (53), Glimmer (54), BLAST (55), Aragorn (56), tRNAscan-SE (57), and HHPred (58). All 79 Gordonia phages were analyzed with the program Phamerator (35) in conjunction with many other phages of the phylum Actinobacteria found in GenBank as of October 2016, for a total of 789 phage genomes (Phamerator database Actinobacteriophage_789). The 77,955 genes in the databases were grouped into 11,035 phamilies (phams) using Phamerator, which utilizes the alignment-free clustering algorithm kclust; this resulted in an average size of 7 genes per pham and 4,630 phams with single genes (orphams). A separate Phamerator database, Cyanobacteriophage_209, was constructed for phages infecting hosts of the phylum Cyanobacteria, involving a total of 209 published whole cyanophage genomes retrieved from the RefSeq and GenBank nr databases. Average nucleotide identity was calculated using DNA Master. The ANI heat map was generated by using “heatmap2” in R, and distance between genomes was calculated using the dist function with the “maximum” argument and the clustering method “single.” Phage order in the ANI figure is as follows: Nymphadora, BatStarr, Kita, Zirinka, BaxterFox, Yeezy, Attis, SoilAssassin, Howe, Eyre, Guacamole, CaptainKirk2, Obliviate, Utz, UmaThurman, Blueberry, CarolAnn, Wizard, Twister6, Bowser, Schwabeltier, Hedwig, Lucky10, BritBrat, GAL1, Nyceirae, Phinally, GTE6, Vivi2, BetterKatz, Jeanie, McGonagall, GRU3, GMA5, Ghobes, Smoothie, Bachita, ClubL, Cucurbita, OneUp, Yvonnetastic, Bantam, KatherineG, Soups, Rosalind, Strosahl, Remus, JSwag, GRU1, GTE5 GTE8, GMA4, GMA1, Emalyn, GTE2, Cozz, Splinter, Vendetta, Gsput1, Terapin, Hotorobo, Monty, Woes, GTE7, GMA5, Benczkowski14, Katyusha, Kvothe, Demosthenes, GordTnk2, Gmala1, GordDuk1, Jumbo, GMA3, Kampe, PatrickStar, Orchid, GMA2, GMA6.

Gene content networks were generated using SplitsTree (59, 60), based on pham membership of genes in each genome as calculated by Phamerator. Stoperators in the Subcluster A13 and A15 genomes were identified using the DNA Master scan function and searching for a 13-bp consensus sequence similar to the published A1 and A2 consensus sequences with similar instances and orientations as those sequences. For Che12 (A2), the consensus sequence is 5′-GGTGGTTGTCAAG, for Phlei (A13) the sequence is 5′-GCTTGGGTGTCAAG, and for KatherineG (A15), the sequence is 5′-GGGGATTGTCAAG. No more than 2 substitutions from the consensus sequence per sequence instance were allowed to be identified as a stoperator.

All pairwise gene content dissimiliarities (GCDs) were calculated for each phage database using custom written python scripts (49). GCD is calculated by determining the proportion of shared phams relative to each genome’s total number of phams, and then averaging the two proportions:

GCD=1(Shared phamsTotal phams in genome A+Shared phamsTotal phams in genome B2)

GCD plots were generated in R or Excel. Phage-specific MaxGCDGaps were calculated using custom written python scripts, as follows. For each phage, all pairwise GCD values were ranked by magnitude, and the difference between each consecutive GCD value was calculated (GCD gap). GCD gap is calculated as follows:

GCDn − GCDn+1 = GCD gap(n, n+1)

The maximum GCD gap of the data subset was identified, which can range from near 0 (indicating small gene content discontinuities) to 1 (indicating large gene content discontinuities) (see Fig. S28 at figshare [https://doi.org/10.6084/m9.figshare.5149663]). Box plot distributions of MaxGCDGaps were plotted in R. All scripts are available upon request.

For electron microscopy, phage lysates were applied to carbon-coated copper grids, stained with 1% uranyl acetate, and imaged using a Morgani Technai transmission electron microscope.

ACKNOWLEDGMENTS

We thank Beckie Bortz, Emily Furbee, Sarah Grubb, and the SEA-PHAGES and PHIRE students at the University of Pittsburgh for phage isolation and annotation, Karen Klyczek, Fred Bonilla, and the SEA-PHAGES students at the University of Wisconsin—River Falls, for the isolation, annotation, and preliminary comparative analysis of Jumbo, Bantam, Remus, JSwag, and Strosahl, and Randy DeJong, John Wertz, and the SEA-PHAGES students at Calvin College for the isolation, annotation, and preliminary comparative analysis of Cucurbita, and members of the Hatfull lab for helpful discussions.

This work was supported by funding from National Institutes of Health grant GM116884 and Howard Hughes Medical Institute grant 54308198 to G.F.H. and by National Science Foundation Graduate Research Fellowship grant 1247842 to T.N.M. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Footnotes

Citation Pope WH, Mavrich TN, Garlena RA, Guerrero-Bustamante CA, Jacobs-Sera D, Montgomery MT, Russell DA, Warner MH, Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science (SEA-PHAGES), Hatfull GF. 2017. Bacteriophages of Gordonia spp. display a spectrum of diversity and genetic relationships. mBio 8:e01069-17. https://doi.org/10.1128/mBio.01069-17.

REFERENCES

1. Hatfull GF, Hendrix RW. 2011. Bacteriophages and their genomes. Curr Opin Virol 1:298–303. doi: 10.1016/j.coviro.2011.06.009. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
2. Pedulla ML, Ford ME, Houtz JM, Karthikeyan T, Wadsworth C, Lewis JA, Jacobs-Sera D, Falbo J, Gross J, Pannunzio NR, Brucker W, Kumar V, Kandasamy J, Keenan L, Bardarov S, Kriakov J, Lawrence JG, Jacobs WR Jr, Hendrix RW, Hatfull GF. 2003. Origins of highly mosaic mycobacteriophage genomes. Cell 113:171–182. doi: 10.1016/S0092-8674(03)00233-2. [PubMed] [CrossRef] [Google Scholar]
3. Hendrix RW, Smith MC, Burns RN, Ford ME, Hatfull GF. 1999. Evolutionary relationships among diverse bacteriophages and prophages: all the world’s a phage. Proc Natl Acad Sci U S A 96:2192–2197. doi: 10.1073/pnas.96.5.2192. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
4. Pope WH, Bowman CA, Russell DA, Jacobs-Sera D, Asai DJ, Cresawn SG, Jacobs WR, Hendrix RW, Lawrence JG, Hatfull GF, Science Education Alliance Phage Hunters Advancing Genomics and Evolutionary Science, Phage Hunters Integrating Research and Education, Mycobacterial Genetics Course . 2015. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity. eLife 4:e06416. doi: 10.7554/eLife.06416. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
5. Grose JH, Casjens SR. 2014. Understanding the enormous diversity of bacteriophages: the tailed phages that infect the bacterial family Enterobacteriaceae. Virology 468–470:421–443. doi: 10.1016/j.virol.2014.08.024. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
6. Kwan T, Liu J, DuBow M, Gros P, Pelletier J. 2005. The complete genomes and proteomes of 27 Staphylococcus aureus bacteriophages. Proc Natl Acad Sci U S A 102:5174–5179. doi: 10.1073/pnas.0501140102. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
7. Kwan T, Liu J, Dubow M, Gros P, Pelletier J. 2006. Comparative genomic analysis of 18 Pseudomonas aeruginosa bacteriophages. J Bacteriol 188:1184–1187. doi: 10.1128/JB.188.3.1184-1187.2006. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
8. Gregory AC, Solonenko SA, Ignacio-Espinoza JC, LaButti K, Copeland A, Sudek S, Maitland A, Chittick L, Dos Santos F, Weitz JS, Worden AZ, Woyke T, Sullivan MB. 2016. Genomic differentiation among wild cyanophages despite widespread horizontal gene transfer. BMC Genomics 17:930. doi: 10.1186/s12864-016-3286-x. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
9. Hatfull GF, Jacobs-Sera D, Lawrence JG, Pope WH, Russell DA, Ko CC, Weber RJ, Patel MC, Germane KL, Edgar RH, Hoyte NN, Bowman CA, Tantoco AT, Paladin EC, Myers MS, Smith AL, Grace MS, Pham TT, O’Brien MB, Vogelsberger AM, Hryckowian AJ, Wynalek JL, Donis-Keller H, Bogel MW, Peebles CL, Cresawn SG, Hendrix RW. 2010. Comparative genomic analysis of 60 mycobacteriophage genomes: genome clustering, gene acquisition, and gene size. J Mol Biol 397:119–143. doi: 10.1016/j.jmb.2010.01.011. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
10. Deng L, Ignacio-Espinoza JC, Gregory AC, Poulos BT, Weitz JS, Hugenholtz P, Sullivan MB. 2014. Viral tagging reveals discrete populations in Synechococcus viral genome sequence space. Nature 513:242–245. doi: 10.1038/nature13459. [PubMed] [CrossRef] [Google Scholar]
11. Jacobs-Sera D, Marinelli LJ, Bowman C, Broussard GW, Guerrero Bustamante C, Boyle MM, Petrova ZO, Dedrick RM, Pope WH, Science Education Alliance Phage Hunters Advancing Genomics and Evolutionary Science Sea-Phages Program, Modlin RL, Hendrix RW, Hatfull GF. 2012. On the nature of mycobacteriophage diversity and host preference. Virology 434:187–201. doi: 10.1016/j.virol.2012.09.026. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
12. Pope WH, Jacobs-Sera D, Russell DA, Rubin DH, Kajee A, Msibi ZN, Larsen MH, Jacobs WR Jr, Lawrence JG, Hendrix RW, Hatfull GF. 2014. Genomics and proteomics of mycobacteriophage patience, an accidental tourist in the Mycobacterium neighborhood. mBio 5:e02145. doi: 10.1128/mBio.02145-14. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
13. Dyson ZA, Tucci J, Seviour RJ, Petrovski S. 2015. Lysis to kill: evaluation of the lytic abilities, and genomics of nine bacteriophages infective for Gordonia spp. and their potential use in activated sludge foam biocontrol. PLoS One 10:e0134512. doi: 10.1371/journal.pone.0134512. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
14. Liu M, Gill JJ, Young R, Summer EJ. 2015. Bacteriophages of wastewater foaming-associated filamentous Gordonia reduce host levels in raw activated sludge. Sci Rep 5:13754. doi: 10.1038/srep13754. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
15. Montgomery MT, Pope WH, Arnold ZM, Basina A, Iyer AM, Stoner TH, Kasturiarachi NS, Pressimone CA, Schiebel JG, Furbee EC, Grubb SR, Warner MH, Garlena RA, Russell DA, Jacobs-Sera D, Hatfull GF. 2016. Genome sequences of Gordonia phages Bowser and Schwabeltier. Genome Announc 4:e00596-16. doi: 10.1128/genomeA.00596-16. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
16. Petrovski S, Seviour RJ, Tillett D. 2011. Characterization of the genome of the polyvalent lytic bacteriophage GTE2, which has potential for biocontrol of Gordonia-, Rhodococcus-, and Nocardia-stabilized foams in activated sludge plants. Appl Environ Microbiol 77:3923–3929. doi: 10.1128/AEM.00025-11. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
17. Petrovski S, Tillett D, Seviour RJ. 2012. Genome sequences and characterization of the related Gordonia phages GTE5 and GRU1 and their use as potential biocontrol agents. Appl Environ Microbiol 78:42–47. doi: 10.1128/AEM.05584-11. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
18. Pope WH, Akbar AF, Ayers TN, Belohoubek SG, Chung CF, Hartman AC, Kayiti T, Kessler CM, Koman PI, Kotovskiy GA, Morgan TM, Rohac RM, Silva GM, Willis CE III, Milliken KA, Shedlock KA, Stanton AC, Toner CL, Furbee EC, Grubb SR, Warner MH, Montgomery MT, Garlena RA, Russell DA, Jacobs-Sera D, Hatfull GF. 2016. Genome sequences of Gordonia bacteriophages Obliviate, UmaThurman, and Guacamole. Genome Announc 4:e00595-16. doi: 10.1128/genomeA.00595-16. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
19. Pope WH, Anderson KC, Arora C, Bortz ME, Burnet G, Conover DH, D’Incau GM, Ghobrial JA, Jonas AL, Migdal EJ, Rote NL, German BA, McDonnell JE, Mezghani N, Schafer CE, Thompson PK, Ulbrich MC, Yu VJ, Furbee EC, Grubb SR, Warner MH, Montgomery MT, Garlena RA, Russell DA, Jacobs-Sera D, Hatfull GF. 2016. Genome sequences of Gordonia terrae bacteriophages Phinally and Vivi2. Genome Announc 4:e00599-16. doi: 10.1128/genomeA.00599-16. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
20. Pope WH, Bandla S, Colbert AK, Eichinger FG, Gamburg MB, Horiates SG, Jamison JM, Julian DR, Moore WA, Murthy P, Powell MC, Smith SV, Mezghani N, Milliken KA, Thompson PK, Toner CL, Ulbrich MC, Furbee EC, Grubb SR, Warner MH, Montgomery MT, Garlena RA, Russell DA, Jacobs-Sera D, Hatfull GF. 2016. Genome sequences of Gordonia phages BaxterFox, Kita, Nymphadora, and Yeezy. Genome Announc 4:e00600-16. doi: 10.1128/genomeA.00600-16. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
21. Pope WH, Bandyopadhyay A, Carlton ML, Kane MT, Panchal NJ, Pham YC, Reynolds ZJ, Sapienza MS, German BA, McDonnell JE, Schafer CE, Yu VJ, Furbee EC, Grubb SR, Warner MH, Montgomery MT, Garlena RA, Russell DA, Jacobs-Sera D, Hatfull GF. 2016. Genome sequence of Gordonia phage Yvonnetastic. Genome Announc 4:e00594-16. doi: 10.1128/genomeA.00594-16. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
22. Pope WH, Benczkowski MS, Green DE, Hwang M, Kennedy B, Kocak B, Kruczek E, Lin L, Moretti ML, Onelangsy FL, Mezghani N, Milliken KA, Toner CL, Thompson PK, Ulbrich MC, Furbee EC, Grubb SR, Warner MH, Montgomery MT, Garlena RA, Russell DA, Jacobs-Sera D, Hatfull GF. 2016. Genome sequences of Gordonia terrae phages Benczkowski14 and Katyusha. Genome Announc 4:e00578-16. doi: 10.1128/genomeA.00578-16. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
23. Pope WH, Berryman EN, Forrest KM, McHale L, Wertz AT, Zhuang Z, Kasturiarachi NS, Pressimone CA, Schiebel JG, Furbee EC, Grubb SR, Warner MH, Montgomery MT, Garlena RA, Russell DA, Jacobs-Sera D, Hatfull GF. 2016. Genome sequence of Gordonia phage BetterKatz. Genome Announc 4:e00590-16. doi: 10.1128/genomeA.00590-16. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
24. Pope WH, Biery DN, Huff ZT, Huynh AB, McFadden WM, Mouat JS, Schneiderman SE, Song H, Szpak LE, Umbaugh MS, German BA, McDonnell JE, Mezghani N, Schafer CE, Thompson PK, Ulbrich MC, Yu VJ, Furbee EC, Grubb SR, Warner MH, Montgomery MT, Garlena RA, Russell DA, Jacobs-Sera D, Hatfull GF. 2016. Genome sequences of Gordonia terrae phages Attis and SoilAssassin. Genome Announc 4:e00591-16. doi: 10.1128/genomeA.00591-16. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
25. Pope WH, Brown AK, Fisher DJ, Okwiya NH, Savage KA, German BA, McDonnell JE, Schafer CE, Yu VJ, Furbee EC, Grubb SR, Warner MH, Montgomery MT, Garlena RA, Russell DA, Jacobs-Sera D, Hatfull GF. 2016. Genome sequence of Gordonia bacteriophage Lucky10. Genome Announc 4:e00580-16. doi: 10.1128/genomeA.00580-16. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
26. Pope WH, Davis JP, O’Shea S, Pfeiffer AC, Rich AN, Xue JC, Shedlock KA, Stanton AC, Furbee EC, Grubb SR, Warner MH, Montgomery MT, Garlena RA, Russell DA, Jacobs-Sera D, Hatfull GF. 2016. Genome sequences of Gordonia phages Hotorobo, Woes, and Monty. Genome Announc 4:e00598-16. doi: 10.1128/genomeA.00598-16. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
27. Pope WH, Guido MJ, Iyengar P, Nigra JT, Serbin MB, Kasturiarachi NS, Pressimone CA, Schiebel JG, Furbee EC, Grubb SR, Warner MH, Montgomery MT, Garlena RA, Russell DA, Jacobs-Sera D, Hatfull GF. 2016. Genome sequence of Gordonia phage Emalyn. Genome Announc 4:e00597-16. doi: 10.1128/genomeA.00597-16. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
28. Pope WH, Montgomery MT, Bonilla JA, Dejong R, Garlena RA, Guerrero Bustamante C, Klyczek KK, Russell DA, Wertz JT, Jacobs-Sera D, Hatfull GF. 2017. Complete genome sequences of 38 Gordonia sp. bacteriophages. Genome Announc 5:e01143-16. doi: 10.1128/genomeA.01143-16. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
29. Dyson ZA, Brown TL, Farrar B, Doyle SR, Tucci J, Seviour RJ, Petrovski S. 2016. Locating and activating molecular “time bombs”: induction of Mycolata prophages. PLoS One 11:e0159957. doi: 10.1371/journal.pone.0159957. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
30. Petrovski S, Seviour RJ, Tillett D. 2011. Prevention of Gordonia and Nocardia stabilized foam formation by using bacteriophage GTE7. Appl Environ Microbiol 77:7864–7867. doi: 10.1128/AEM.05692-11. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
31. Arenskötter M, Bröker D, Steinbüchel A. 2004. Biology of the metabolically diverse genus Gordonia. Appl Environ Microbiol 70:3195–3204. doi: 10.1128/AEM.70.6.3195-3204.2004. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
32. Jordan TC, Burnett SH, Carson S, Caruso SM, Clase K, DeJong RJ, Dennehy JJ, Denver DR, Dunbar D, Elgin SC, Findley AM, Gissendanner CR, Golebiewska UP, Guild N, Hartzog GA, Grillo WH, Hollowell GP, Hughes LE, Johnson A, King RA, Lewis LO, Li W, Rosenzweig F, Rubin MR, Saha MS, Sandoz J, Shaffer CD, Taylor B, Temple L, Vazquez E, Ware VC, Barker LP, Bradley KW, Jacobs-Sera D, Pope WH, Russell DA, Cresawn SG, Lopatto D, Bailey CP, Hatfull GF. 2014. A broadly implementable research course in phage discovery and genomics for first-year undergraduate students. mBio 5:e01051-13. doi: 10.1128/mBio.01051-13. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
33. Tsukamura M. 1971. Proposal of a new genus, Gordona, for slightly acid-fast organisms occurring in sputa of patients with pulmonary disease and in soil. J Gen Microbiol 68:15–26. doi: 10.1099/00221287-68-1-15. [PubMed] [CrossRef] [Google Scholar]
34. Russell DA, Guerrero Bustamante CA, Garlena RA, Hatfull GF. 2016. Complete genome sequence of Gordonia terrae 3612. Genome Announc 4:e01058-16. doi: 10.1128/genomeA.01058-16. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
35. Cresawn SG, Bogel M, Day N, Jacobs-Sera D, Hendrix RW, Hatfull GF. 2011. Phamerator: a bioinformatic tool for comparative bacteriophage genomics. BMC Bioinformatics 12:395. doi: 10.1186/1471-2105-12-395. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
36. Pope WH, Jacobs-Sera D, Russell DA, Peebles CL, Al-Atrache Z, Alcoser TA, Alexander LM, Alfano MB, Alford ST, Amy NE, Anderson MD, Anderson AG, Ang AA, Ares M, Barber AJ, Barker LP, Barrett JM, Barshop WD, Bauerle CM, Bayles IM, Belfield KL, Best AA, Borjon A, Bowman CA, Boyer CA, Bradley KW, Bradley VA, Broadway LN, Budwal K, Busby KN, Campbell IW, Campbell AM, Carey A, Caruso SM, Chew RD, Cockburn CL, Cohen LB, Corajod JM, Cresawn SG, Davis KR. 2011. Expanding the diversity of mycobacteriophages: insights into genome architecture and evolution. PLoS One 6:e16329. doi: 10.1371/journal.pone.0016329. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
37. Broussard GW, Oldfield LM, Villanueva VM, Lunt BL, Shine EE, Hatfull GF. 2013. Integration-dependent bacteriophage immunity provides insights into the evolution of genetic switches. Mol Cell 49:237–248. doi: 10.1016/j.molcel.2012.11.012. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
38. Smith MC, Thorpe HM. 2002. Diversity in the serine recombinases. Mol Microbiol 44:299–307. doi: 10.1046/j.1365-2958.2002.02891.x. [PubMed] [CrossRef] [Google Scholar]
39. Payne K, Sun Q, Sacchettini J, Hatfull GF. 2009. Mycobacteriophage lysin B is a novel mycolylarabinogalactan esterase. Mol Microbiol 73:367–381. doi: 10.1111/j.1365-2958.2009.06775.x. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
40. Dedrick RM, Jacobs-Sera D, Bustamante CA, Garlena RA, Mavrich TN, Pope WH, Reyes JC, Russell DA, Adair T, Alvey R, Bonilla JA, Bricker JS, Brown BR, Byrnes D, Cresawn SG, Davis WB, Dickson LA, Edgington NP, Findley AM, Golebiewska U, Grose JH, Hayes CF, Hughes LE, Hutchison KW, Isern S, Johnson AA, Kenna MA, Klyczek KK, Mageeney CM, Michael SF, Molloy SD, Montgomery MT, Neitzel J, Page ST, Pizzorno MC, Poxleitner MK, Rinehart CA, Robinson CJ, Rubin MR, Teyim JN, Vazquez E, Ware VC, Washington J, Hatfull GF. 2017. Prophage-mediated defence against viral attack and viral counter-defence. Nat Microbiol 2:16251. doi: 10.1038/nmicrobiol.2016.251. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
41. Payne KM, Hatfull GF. 2012. Mycobacteriophage endolysins: diverse and modular enzymes with multiple catalytic activities. PLoS One 7:e34052. doi: 10.1371/journal.pone.0034052. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
42. Petrovski S, Dyson ZA, Seviour RJ, Tillett D. 2012. Small but sufficient: the Rhodococcus phage RRH1 has the smallest known Siphoviridae genome at 14.2 kilobases. J Virol 86:358–363. doi: 10.1128/JVI.05460-11. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
43. Brown KL, Sarkis GJ, Wadsworth C, Hatfull GF. 1997. Transcriptional silencing by the mycobacteriophage L5 repressor. EMBO J 16:5914–5921. [PMC free article] [PubMed] [Google Scholar]
44. Lee MH, Pascopella L, Jacobs WR Jr, Hatfull GF. 1991. Site-specific integration of mycobacteriophage L5: integration-proficient vectors for Mycobacterium smegmatis, Mycobacterium tuberculosis, and Bacille Calmette-Guerin. Proc Natl Acad Sci U S A 88:3111–3115. doi: 10.1073/pnas.88.8.3111. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
45. Kim AI, Ghosh P, Aaron MA, Bibb LA, Jain S, Hatfull GF. 2003. Mycobacteriophage Bxb1 integrates into the Mycobacterium smegmatis groEL1 gene. Mol Microbiol 50:463–473. doi: 10.1046/j.1365-2958.2003.03723.x. [PubMed] [CrossRef] [Google Scholar]
46. Dedrick RM, Mavrich TN, Ng WL, Cervantes Reyes JC, Olm MR, Rush RE, Jacobs-Sera D, Russell DA, Hatfull GF. 2016. Function, expression, specificity, diversity and incompatibility of actinobacteriophage parABS systems. Mol Microbiol 101:625–644. doi: 10.1111/mmi.13414. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
47. Stella EJ, Franceschelli JJ, Tasselli SE, Morbidoni HR. 2013. Analysis of novel mycobacteriophages indicates the existence of different strategies for phage inheritance in mycobacteria. PLoS One 8:e56384. doi: 10.1371/journal.pone.0056384. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
48. Ford ME, Sarkis GJ, Belanger AE, Hendrix RW, Hatfull GF. 1998. Genome structure of mycobacteriophage D29: implications for phage evolution. J Mol Biol 279:143–164. doi: 10.1006/jmbi.1997.1610. [PubMed] [CrossRef] [Google Scholar]
49. Mavrich TN, Hatfull GF. 2017. Bacteriophage evolution differs by host, lifestyle, and genome. Nat Microbiol 2:17112. doi: 10.1038/nmicrobiol.2017.112. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
50. Marinelli LJ, Fitz-Gibbon S, Hayes C, Bowman C, Inkeles M, Loncaric A, Russell DA, Jacobs-Sera D, Cokus S, Pellegrini M, Kim J, Miller JF, Hatfull GF, Modlin RL. 2012. Propionibacterium acnes bacteriophages display limited genetic diversity and broad killing activity against bacterial skin isolates. mBio 3:e00279-12. doi: 10.1128/mBio.00279-12. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
51. Jacobs WR Jr, Snapper SB, Tuckman M, Bloom BR. 1989. Mycobacteriophage vector systems. Rev Infect Dis 11(Suppl 2):S404–S410. doi: 10.1093/clinids/11.Supplement_2.S404. [PubMed] [CrossRef] [Google Scholar]
52. Bardarov S, Kriakov J, Carriere C, Yu S, Vaamonde C, McAdam RA, Bloom BR, Hatfull GF, Jacobs WR Jr. 1997. Conditionally replicating mycobacteriophages: a system for transposon delivery to Mycobacterium tuberculosis. Proc Natl Acad Sci U S A 94:10961–10966. doi: 10.1073/pnas.94.20.10961. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
53. Besemer J, Borodovsky M. 2005. GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33:W451–W454. doi: 10.1093/nar/gki487. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
54. Delcher AL, Bratke KA, Powers EC, Salzberg SL. 2007. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23:673–679. doi: 10.1093/bioinformatics/btm009. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
55. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [PubMed] [CrossRef] [Google Scholar]
56. Laslett D, Canback B. 2004. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res 32:11–16. doi: 10.1093/nar/gkh152. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
57. Lowe TM, Eddy SR. 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25:955–964. doi: 10.1093/nar/25.5.0955. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
58. Söding J, Biegert A, Lupas AN. 2005. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 33:W244–W248. doi: 10.1093/nar/gki408. [PMC free article] [PubMed] [CrossRef] [Google Scholar]
59. Huson DH, Bryant D. 2006. Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 23:254–267. doi: 10.1093/molbev/msj030. [PubMed] [CrossRef] [Google Scholar]
60. Huson DH. 1998. SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14:68–73. doi: 10.1093/bioinformatics/14.1.68. [PubMed] [CrossRef] [Google Scholar]
61. Klyczek KK, Bonilla JA, Jacobs-Sera D, Adair TL, Afram P, Allen KG, et al.. 2017. Tales of diversity: Genomic and morphological characteristics of forty-six Arthrobacter phages. PLoS ONE 12:e0180517. doi: 10.1371/journal.pone.0180517. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

Articles from mBio are provided here courtesy of American Society for Microbiology (ASM)

-