Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Apr 26:17:307.
doi: 10.1186/s12864-016-2629-y.

Supporting community annotation and user collaboration in the integrated microbial genomes (IMG) system

Affiliations

Supporting community annotation and user collaboration in the integrated microbial genomes (IMG) system

I-Min A Chen et al. BMC Genomics. .

Abstract

Background: The exponential growth of genomic data from next generation technologies renders traditional manual expert curation effort unsustainable. Many genomic systems have included community annotation tools to address the problem. Most of these systems adopted a "Wiki-based" approach to take advantage of existing wiki technologies, but encountered obstacles in issues such as usability, authorship recognition, information reliability and incentive for community participation.

Results: Here, we present a different approach, relying on tightly integrated method rather than "Wiki-based" method, to support community annotation and user collaboration in the Integrated Microbial Genomes (IMG) system. The IMG approach allows users to use existing IMG data warehouse and analysis tools to add gene, pathway and biosynthetic cluster annotations, to analyze/reorganize contigs, genes and functions using workspace datasets, and to share private user annotations and workspace datasets with collaborators. We show that the annotation effort using IMG can be part of the research process to overcome the user incentive and authorship recognition problems thus fostering collaboration among domain experts. The usability and reliability issues are addressed by the integration of curated information and analysis tools in IMG, together with DOE Joint Genome Institute (JGI) expert review.

Conclusion: By incorporating annotation operations into IMG, we provide an integrated environment for users to perform deeper and extended data analysis and annotation in a single system that can lead to publications and community knowledge sharing as shown in the case studies.

Keywords: Functional curation; Gene annotation; IMG; Manual curation; Metagenomics; Microbial genomics.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Using Phylogenetic Profiler to Find Gene Annotations and Missing Genes. From the Find Genes menu item, a user can select Phylogenetic Profilers: Single Genes submenu (Fig. 1 (i)) to start investigating genes in a selected candidate genome with or without homologs in other closely related genomes (Fig. 1 (ii)). While the “With Homologs” option is useful for additional MyIMG gene annotations, the “Without Homologs” option provides a list of potential missing genes for further investigation (Fig. 1 (iii)). To investigate a potential missing gene, the user first selects the gene and then clicks on the “Missing Gene?” button (Fig. 1 (iii)). Potential missing genes identified by TBlastn search will be displayed
Fig. 2
Fig. 2
Finding Missing IMG Terms Using Function Profile. A user first selects an IMG Part List Nodulation factor biosynthesis, export and regulation to load all component IMG terms into Function Cart (Fig. 2 (i)). All Bradyrhizobium genomes are supposed to have genes associated with these terms. However, some terms are missing in certain genomes (Fig. 2 (ii)). Clicking on the zero count will lead to searching potential genes using BLAST as the result shown in Fig. 2 (iii). Since microbial genes with related functions tend to be close together on the scaffold, an alternative approach is to investigate intergenic regions of genes with functions to look for potential missing genes (Fig. 2 (iv))
Fig. 3
Fig. 3
List of participating genomes and potential genomes with Missing Enzymes. Two new functions are provided to help users to narrow down genome searches (Fig. 3 (i)). Participating Genomes in KEGG Pathway gives users a list of all genomes participated in the selected pathway together with the enzymes (Fig. 3 (ii)). Potential Genomes With Missing Enzymes function gives users a list of potential genomes with missing enzymes to investigate (Fig. 3 (iii))
Fig. 4
Fig. 4
Finding Genes with Missing KO Terms. Many Salmonella enterica genomes have complete KO Module M00302 2-Aminoethylphosphonate transport system. While Salmonella enterica enterica sv. Typhi E01-6750 is shown to be missing a KO Term K11084 (Fig. 4 (i)). When a user displays KEGG Module Map of M00302, he/she can clearly see that the genome has genes associated with 3 other KO terms but not K11084 (Fig. 4 (ii)). By clicking on the “green” KO term on the map, the user can use a new IMG tools to identify 3 genes that can potentially be associated with this KO term
Fig. 5
Fig. 5
Using Function Based Production Name Method to aid MyIMG annotation. A gene may be assigned with a product name “hypothetic protein” due to lack of information even though it is association with some functional assignment. Using the Function Based finding candidate product name method from the Gene Detail page (Fig. 5 (i)), users will be able to see the function distribution of other public genes with the same functional assignment (Fig. 5 (ii)). The List Genes function shows all public genes with selected functional assignment (Fig. 5 (iii)), which can provide hint for MyIMG annotation of the candidate gene
Fig. 6
Fig. 6
Using Gene Neighborhood to aid MyIMG annotation. A gene may be assigned with a product name “conserved hypothetic protein” due to lack of information (Fig. 6 (i)). However, from the gene neighborhood with the same top COG hit (Fig. 6 (ii)) shows that there are other similar genes with more meaningful product names (Fig. 5 (iii)). In this case, a user can add MyIMG annotation product name such as “putative RNA-associated protein”
Fig. 7
Fig. 7
IMG Pathway Curation. Users with curation privilege will be able to see an additional Curation submenu item in the Analysis Cart (Fig. 7 (i)). An IMG Pathway is consist of one or more sequential, alternative, and/or optional reactions (Fig. 7 (ii)), while each reaction is consist of definition, equation, compounds as reactant, product or catalyst and related IMG terms (Fig. 7 (iii))
Fig. 8
Fig. 8
Biosynthetic Cluster and Secondary Metabolite Annotation. Experimentally verified biosynthetic clusters are associated with secondary metabolites, while such information is missing from predicted biosynthetic clusters. Biosynthetic cluster 160962703 of Streptomyces sp. WT1 is an experimentally verified cluster (Genbank Accession JN207130) associated with natural product Mevalonate. Genes of this cluster participate in 7 KEGG modules (Fig. 8 (i)). KEGG Module Map M00095 C5 isoprenoid biosynthesis, mevalonate pathway for this cluster shows that genes in this cluster are linked to 6 of the KO terms (Fig. 8 (ii)). Predicted biosynthetic cluster 161507570 of Streptomyces fradiae ATCC 19609 does not have any secondary metabolite information. However, it contains 6 genes associated with the same 6 KO terms of M00095, which is a good indication that the cluster can produce the same secondary metabolite (Fig. 8 (iii)). Users can use the “Add SM Annotation” function to annotate the association (Fig. 8 (iv))
Fig. 9
Fig. 9
KEGG Map Display of Biosynthetic Cluster Genes. An experimentally verified biosynthetic cluster from NCBI with Genbank ID X58833 has 6 genes (Fig. 9 (i)). The KEGG Map shows the genes in this cluster only partially covers the Actinorhodin pathway. The boxes colored in magenta in the pathway map are linked to genes of this cluster, while the boxes colored in purple are genes in the same genome but not in the cluster (Fig. 9 (ii)). By adding 5 additional upstream and downstream genes, a new cluster will be able to cover the entire pathway (Fig. 9 (iii))
Fig. 10
Fig. 10
Make your own pathway and check assertion. A user can create a new “pathway” by adding functions into a workspace function set. For example, a user can create a “3 hydroxypropionate” pathway to include 3 KO terms K09709, K14471 and K14472 (Fig. 10 (i)). Function-Genome Profile then shows which genomes are “asserted” for this new pathway (Fig. 10 (ii))

Similar articles

  • IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes.
    Chen IA, Chu K, Palaniappan K, Pillay M, Ratner A, Huang J, Huntemann M, Varghese N, White JR, Seshadri R, Smirnova T, Kirton E, Jungbluth SP, Woyke T, Eloe-Fadrosh EA, Ivanova NN, Kyrpides NC. Chen IA, et al. Nucleic Acids Res. 2019 Jan 8;47(D1):D666-D677. doi: 10.1093/nar/gky901. Nucleic Acids Res. 2019. PMID: 30289528 Free PMC article.
  • IMG 4 version of the integrated microbial genomes comparative analysis system.
    Markowitz VM, Chen IM, Palaniappan K, Chu K, Szeto E, Pillay M, Ratner A, Huang J, Woyke T, Huntemann M, Anderson I, Billis K, Varghese N, Mavromatis K, Pati A, Ivanova NN, Kyrpides NC. Markowitz VM, et al. Nucleic Acids Res. 2014 Jan;42(Database issue):D560-7. doi: 10.1093/nar/gkt963. Epub 2013 Oct 27. Nucleic Acids Res. 2014. PMID: 24165883 Free PMC article.
  • IMG/M: integrated genome and metagenome comparative data analysis system.
    Chen IA, Markowitz VM, Chu K, Palaniappan K, Szeto E, Pillay M, Ratner A, Huang J, Andersen E, Huntemann M, Varghese N, Hadjithomas M, Tennessen K, Nielsen T, Ivanova NN, Kyrpides NC. Chen IA, et al. Nucleic Acids Res. 2017 Jan 4;45(D1):D507-D516. doi: 10.1093/nar/gkw929. Epub 2016 Oct 13. Nucleic Acids Res. 2017. PMID: 27738135 Free PMC article.
  • Mitochondrial Disease Sequence Data Resource (MSeqDR): a global grass-roots consortium to facilitate deposition, curation, annotation, and integrated analysis of genomic data for the mitochondrial disease clinical and research communities.
    Falk MJ, Shen L, Gonzalez M, Leipzig J, Lott MT, Stassen AP, Diroma MA, Navarro-Gomez D, Yeske P, Bai R, Boles RG, Brilhante V, Ralph D, DaRe JT, Shelton R, Terry SF, Zhang Z, Copeland WC, van Oven M, Prokisch H, Wallace DC, Attimonelli M, Krotoski D, Zuchner S, Gai X; MSeqDR Consortium Participants; MSeqDR Consortium participants: Sherri Bale, Jirair Bedoyan, Doron Behar, Penelope Bonnen, Lisa Brooks, Claudia Calabrese, Sarah Calvo, Patrick Chinnery, John Christodoulou, Deanna Church,; Rosanna Clima, Bruce H. Cohen, Richard G. Cotton, IFM de Coo, Olga Derbenevoa, Johan T. den Dunnen, David Dimmock, Gregory Enns, Giuseppe Gasparre,; Amy Goldstein, Iris Gonzalez, Katrina Gwinn, Sihoun Hahn, Richard H. Haas, Hakon Hakonarson, Michio Hirano, Douglas Kerr, Dong Li, Maria Lvova, Finley Macrae, Donna Maglott, Elizabeth McCormick, Grant Mitchell, Vamsi K. Mootha, Yasushi Okazaki,; Aurora Pujol, Melissa Parisi, Juan Carlos Perin, Eric A. Pierce, Vincent Procaccio, Shamima Rahman, Honey Reddi, Heidi Rehm, Erin Riggs, Richard Rodenburg, Yaffa Rubinstein, Russell Saneto, Mariangela Santorsola, Curt Scharfe,; Claire Sheldon, Eric A. Shoubridge, Domenico Simone, Bert Smeets, Jan A. Smeitink, C… See abstract for full author list ➔ Falk MJ, et al. Mol Genet Metab. 2015 Mar;114(3):388-96. doi: 10.1016/j.ymgme.2014.11.016. Epub 2014 Dec 4. Mol Genet Metab. 2015. PMID: 25542617 Free PMC article. Review.
  • Ten years of maintaining and expanding a microbial genome and metagenome analysis system.
    Markowitz VM, Chen IA, Chu K, Pati A, Ivanova NN, Kyrpides NC. Markowitz VM, et al. Trends Microbiol. 2015 Nov;23(11):730-741. doi: 10.1016/j.tim.2015.07.012. Epub 2015 Oct 14. Trends Microbiol. 2015. PMID: 26439299 Review.

Cited by

References

    1. Kyrpides NC, Ouzounis CA. Whole-genome sequence annotation: ‘Going wrong with confidence’. Mol Microbiol. 1999;32(4):886–887. doi: 10.1046/j.1365-2958.1999.01380.x. - DOI - PubMed
    1. Kyrpides NC. Fifteen years of microbial genomics: meeting the challenges and fulfilling the dream. Nat Biotechnol. 2009;27(7):627–632. doi: 10.1038/nbt.1552. - DOI - PubMed
    1. Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, Iyer R, Schatz MC, Sinha S, Robinson GE. Big Data: Astronomical or Genomical? PLoS Biol. 2015;13(7):e1002195. doi: 10.1371/journal.pbio.1002195. - DOI - PMC - PubMed
    1. Howe D, Costanzo M, Fey P, Gojobori T, Hannick L, Hide W, Hill DP, Kania R, Schaeffer M, St Pierre S, Twigger S, White O, Rhee SY. Big data: The future of biocuration. Nature. 2008;455:47–50. doi: 10.1038/455047a. - DOI - PMC - PubMed
    1. Huss JW, III, Orozco C, Goodale J, Wu C, Batalov S, Vickers TJ, Valafar F, Su AI. A Gene Wiki for Community Annotation of Gene Function. PLoS Biol. 2008;e175(7):1398–1402. - PMC - PubMed

Publication types

LinkOut - more resources

-