Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Jul 16:7:347.
doi: 10.1186/1471-2105-7-347.

BBP: Brucella genome annotation with literature mining and curation

Affiliations

BBP: Brucella genome annotation with literature mining and curation

Zuoshuang Xiang et al. BMC Bioinformatics. .

Abstract

Background: Brucella species are Gram-negative, facultative intracellular bacteria that cause brucellosis in humans and animals. Sequences of four Brucella genomes have been published, and various Brucella gene and genome data and analysis resources exist. A web gateway to integrate these resources will greatly facilitate Brucella research. Brucella genome data in current databases is largely derived from computational analysis without experimental validation typically found in peer-reviewed publications. It is partially due to the lack of a literature mining and curation system able to efficiently incorporate the large amount of literature data into genome annotation. It is further hypothesized that literature-based Brucella gene annotation would increase understanding of complicated Brucella pathogenesis mechanisms.

Results: The Brucella Bioinformatics Portal (BBP) is developed to integrate existing Brucella genome data and analysis tools with literature mining and curation. The BBP InterBru database and Brucella Genome Browser allow users to search and analyze genes of 4 currently available Brucella genomes and link to more than 20 existing databases and analysis programs. Brucella literature publications in PubMed are extracted and can be searched by a TextPresso-powered natural language processing method, a MeSH browser, a keywords search, and an automatic literature update service. To efficiently annotate Brucella genes using the large amount of literature publications, a literature mining and curation system coined Limix is developed to integrate computational literature mining methods with a PubSearch-powered manual curation and management system. The Limix system is used to quickly find and confirm 107 Brucella gene mutations including 75 genes shown to be essential for Brucella virulence. The 75 genes are further clustered using COG. In addition, 62 Brucella genetic interactions are extracted from literature publications. These results make possible more comprehensive investigation of Brucella pathogenesis. Other BBP features include publication email alert service, Brucella researchers' contact database, and discussion forum.

Conclusion: BBP is a gateway for Brucella researchers to search, analyze, and curate Brucella genome data originated from public databases and literature. Brucella gene mutations and genetic interactions are annotated using Limix leading to better understanding of Brucella pathogenesis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The BBP system architecture for Brucella genome analysis and literature mining and curation. A PubMed literature extraction and parsing program loads all Brucella-related papers from PubMed into the Brucella Limix database and the TextPresso-powered text processing pipeline. An automatic literature update program also extracts Brucella papers published in the recent and previous months. The Limix system provides an efficient way for literature searching and data extraction, edition, and submission by integrating computational text mining programs with manual literature curation and management features. InterBru integrates Brucella genome data from different data sources including our in-house curated data from the Brucella Limix database. The Brucella Genome Browser (BGBrowser) features graphic visualization of Brucella genome data and offers many analysis tools. InterBru and BGBrowser also share the same output page displaying comprehensive Brucella gene and protein information.
Figure 2
Figure 2
A scenario of Brucella genome query and analysis. (A) The InterBru database allows users to search public databases (e.g., RefSeq, Swissprot) for Brucella genes and proteins via different characteristics or identifiers. Here a user searches for Brucella sodC gene. (B) BGBrowser localizes the sodC gene and it neighbor genes in Brucella genomes and provides many add-on gene analysis tools. (C) The detailed gene information table shared by InterBru and BGBrowser provides sequences and functional annotation of Brucella sodC gene and its encoded protein Cu/Zn superoxide dismutase. Links to various databases and detailed curated data from Limix are summarized. Local BLAST programs are also available from this page for similarity analysis.
Figure 3
Figure 3
MeSH Browser. All the Brucella literature publications can be visualized by the interactive MeSH-tree browser. The two clickable numbers in each line links to all publications with the term as a MeSH term or a major MeSH term, respectively. This figure shows the hierarchical MeSH tree structure leading to Mutagenesis and Gene Deletion.
Figure 4
Figure 4
Integrated computational text mining and manual curation in Limix. The computational text mining frame shows a typical TextPresso-type result after query for the sodC keyword and "mutant" category. All sodC words and words under mutant category are clearly labeled in colors. One sentence containing both sodC and mutant words is highlighted in bold and considered as one match. A curator can easily highlight and copy text from this frame to an editable text field below the frame within the same page. The data can be further edited and submitted to a backend database by clicking an 'update' button. Other literature retrieval approaches (e.g., keywords search) are also available in the computational text mining frame.
Figure 5
Figure 5
Brucella genetic interaction map and description. Limix is used to find and confirm 62 Brucella genetic interactions. In the Brucella genetic interaction map displayed in a SVG form, any node can be clicked for detailed gene information, and any edge can be clicked to show description of the specific interaction.

Similar articles

Cited by

References

    1. Corbel MJ. Brucellosis: an overview. Emerg Infect Dis. 1997;3:213–221. - PMC - PubMed
    1. Cloeckaert A, Verger JM, Grayon M, Paquet JY, Garin-Bastuji B, Foster G, Godfroid J. Classification of Brucella spp. isolated from marine mammals by DNA polymorphism at the omp2 locus. Microbes Infect. 2001;3:729–738. doi: 10.1016/S1286-4579(01)01427-7. - DOI - PubMed
    1. Paulsen IT, Seshadri R, Nelson KE, Eisen JA, Heidelberg JF, Read TD, Dodson RJ, Umayam L, Brinkac LM, Beanan MJ, Daugherty SC, Deboy RT, Durkin AS, Kolonay JF, Madupu R, Nelson WC, Ayodeji B, Kraul M, Shetty J, Malek J, Van Aken SE, Riedmuller S, Tettelin H, Gill SR, White O, Salzberg SL, Hoover DL, Lindler LE, Halling SM, Boyle SM, Fraser CM. The Brucella suis genome reveals fundamental similarities between animal and plant pathogens and symbionts. Proc Natl Acad Sci U S A. 2002;99:13148–13153. doi: 10.1073/pnas.192319099. - DOI - PMC - PubMed
    1. Halling SM, Peterson-Burch BD, Bricker BJ, Zuerner RL, Qing Z, Li LL, Kapur V, Alt DP, Olsen SC. Completion of the genome sequence of Brucella abortus and comparison to the highly similar genomes of Brucella melitensis and Brucella suis. J Bacteriol. 2005;187:2715–2726. doi: 10.1128/JB.187.8.2715-2726.2005. - DOI - PMC - PubMed
    1. DelVecchio VG, Kapatral V, Redkar RJ, Patra G, Mujer C, Los T, Ivanova N, Anderson I, Bhattacharyya A, Lykidis A, Reznik G, Jablonski L, Larsen N, D'Souza M, Bernal A, Mazur M, Goltsman E, Selkov E, Elzer PH, Hagius S, O'Callaghan D, Letesson JJ, Haselkorn R, Kyrpides N, Overbeek R. The genome sequence of the facultative intracellular pathogen Brucella melitensis. Proc Natl Acad Sci U S A. 2002;99:443–448. doi: 10.1073/pnas.221575398. - DOI - PMC - PubMed

Publication types

LinkOut - more resources

-