How to learn about gene function: text-mining or ontologies?

doi:10.1016/j.ymeth.2014.07.004

Review

. 2015 Mar:74:3-15.

doi: 10.1016/j.ymeth.2014.07.004. Epub 2014 Aug 1.

How to learn about gene function: text-mining or ontologies?

Theodoros G Soldatos¹, Nelson Perdigão², Nigel P Brown³, Kenneth S Sabir⁴, Seán I O'Donoghue⁵

Affiliations

¹ MolecularHealth GmbH, Heidelberg, Germany.
² Instituto Superior Técnico, Universidade de Lisboa, Portugal.
³ CEITEC, Masaryk University, Brno, Czech Republic.
⁴ Garvan Institute of Medical Research, Sydney, Australia.
⁵ Garvan Institute of Medical Research, Sydney, Australia; CSIRO Computational Informatics, Sydney, Australia.

PMID: 25088781
DOI: 10.1016/j.ymeth.2014.07.004

Review

How to learn about gene function: text-mining or ontologies?

Theodoros G Soldatos et al. Methods. 2015 Mar.

. 2015 Mar:74:3-15.

doi: 10.1016/j.ymeth.2014.07.004. Epub 2014 Aug 1.

Authors

Theodoros G Soldatos¹, Nelson Perdigão², Nigel P Brown³, Kenneth S Sabir⁴, Seán I O'Donoghue⁵

Affiliations

¹ MolecularHealth GmbH, Heidelberg, Germany.
² Instituto Superior Técnico, Universidade de Lisboa, Portugal.
³ CEITEC, Masaryk University, Brno, Czech Republic.
⁴ Garvan Institute of Medical Research, Sydney, Australia.
⁵ Garvan Institute of Medical Research, Sydney, Australia; CSIRO Computational Informatics, Sydney, Australia.

PMID: 25088781
DOI: 10.1016/j.ymeth.2014.07.004

Abstract

As the amount of genome information increases rapidly, there is a correspondingly greater need for methods that provide accurate and automated annotation of gene function. For example, many high-throughput technologies--e.g., next-generation sequencing--are being used today to generate lists of genes associated with specific conditions. However, their functional interpretation remains a challenge and many tools exist trying to characterize the function of gene-lists. Such systems rely typically in enrichment analysis and aim to give a quick insight into the underlying biology by presenting it in a form of a summary-report. While the load of annotation may be alleviated by such computational approaches, the main challenge in modern annotation remains to develop a systems form of analysis in which a pipeline can effectively analyze gene-lists quickly and identify aggregated annotations through computerized resources. In this article we survey some of the many such tools and methods that have been developed to automatically interpret the biological functions underlying gene-lists. We overview current functional annotation aspects from the perspective of their epistemology (i.e., the underlying theories used to organize information about gene function into a body of verified and documented knowledge) and find that most of the currently used functional annotation methods fall broadly into one of two categories: they are based either on 'known' formally-structured ontology annotations created by 'experts' (e.g., the GO terms used to describe the function of Entrez Gene entries), or--perhaps more adventurously--on annotations inferred from literature (e.g., many text-mining methods use computer-aided reasoning to acquire knowledge represented in natural languages). Overall however, deriving detailed and accurate insight from such gene lists remains a challenging task, and improved methods are called for. In particular, future methods need to (1) provide more holistic insight into the underlying molecular systems; (2) provide better follow-up experimental testing and treatment options, and (3) better manage gene lists derived from organisms that are not well-studied. We discuss some promising approaches that may help achieve these advances, especially the use of extended dictionaries of biomedical concepts and molecular mechanisms, as well as greater use of annotation benchmarks.

Keywords: Benchmarks; Functional annotation; GO term enrichment; Keyword enhancement; Systems biology; Text mining.

PubMed Disclaimer

Cited by

A community resource to mass explore the wheat grain proteome and its application to the late-maturity alpha-amylase (LMA) problem.
Vincent D, Bui A, Ezernieks V, Shahinfar S, Luke T, Ram D, Rigas N, Panozzo J, Rochfort S, Daetwyler H, Hayden M. Vincent D, et al. Gigascience. 2022 Dec 28;12:giad084. doi: 10.1093/gigascience/giad084. Epub 2023 Nov 1. Gigascience. 2022. PMID: 37919977 Free PMC article.
Finding Gene Associations by Text Mining and Annotating it with Gene Ontology.
Iyyappan OR, Manoharan S. Iyyappan OR, et al. Methods Mol Biol. 2022;2496:71-90. doi: 10.1007/978-1-0716-2305-3_4. Methods Mol Biol. 2022. PMID: 35713859
Advancing drug safety science by integrating molecular knowledge with post-marketing adverse event reports.
Soldatos TG, Kim S, Schmidt S, Lesko LJ, Jackson DB. Soldatos TG, et al. CPT Pharmacometrics Syst Pharmacol. 2022 May;11(5):540-555. doi: 10.1002/psp4.12765. Epub 2022 Feb 20. CPT Pharmacometrics Syst Pharmacol. 2022. PMID: 35143713 Free PMC article. Review.
DNA Methylation, Deamination, and Translesion Synthesis Combine to Generate Footprint Mutations in Cancer Driver Genes in B-Cell Derived Lymphomas and Other Cancers.
Rogozin IB, Roche-Lima A, Tyryshkin K, Carrasquillo-Carrión K, Lada AG, Poliakov LY, Schwartz E, Saura A, Yurchenko V, Cooper DN, Panchenko AR, Pavlov YI. Rogozin IB, et al. Front Genet. 2021 May 19;12:671866. doi: 10.3389/fgene.2021.671866. eCollection 2021. Front Genet. 2021. PMID: 34093666 Free PMC article.
ThermoScan: Semi-automatic Identification of Protein Stability Data From PubMed.
Turina P, Fariselli P, Capriotti E. Turina P, et al. Front Mol Biosci. 2021 Mar 25;8:620475. doi: 10.3389/fmolb.2021.620475. eCollection 2021. Front Mol Biosci. 2021. PMID: 33842537 Free PMC article. Review.

See all "Cited by" articles

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Elsevier Science
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

How to learn about gene function: text-mining or ontologies?

Affiliations

How to learn about gene function: text-mining or ontologies?

Authors

Affiliations

Abstract

Similar articles

Cited by

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials