TEES 2.2: Biomedical Event Extraction for Diverse Corpora

doi:10.1186/1471-2105-16-S16-S4

. 2015;16 Suppl 16(Suppl 16):S4.

doi: 10.1186/1471-2105-16-S16-S4. Epub 2015 Oct 30.

TEES 2.2: Biomedical Event Extraction for Diverse Corpora

Jari Björne, Tapio Salakoski

PMID: 26551925
PMCID: PMC4642046
DOI: 10.1186/1471-2105-16-S16-S4

TEES 2.2: Biomedical Event Extraction for Diverse Corpora

Jari Björne et al. BMC Bioinformatics. 2015.

. 2015;16 Suppl 16(Suppl 16):S4.

doi: 10.1186/1471-2105-16-S16-S4. Epub 2015 Oct 30.

Authors

Jari Björne, Tapio Salakoski

PMID: 26551925
PMCID: PMC4642046
DOI: 10.1186/1471-2105-16-S16-S4

Abstract

Background: The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency parsing. The TEES system has achieved record performance in several of the shared tasks of its domain, and continues to be used in a variety of biomedical text mining tasks.

Results: The TEES system was quickly adapted to the BioNLP'13 Shared Task in order to provide a public baseline for derived systems. An automated approach was developed for learning the underlying annotation rules of event type, allowing immediate adaptation to the various subtasks, and leading to a first place in four out of eight tasks. The system for the automated learning of annotation rules is further enhanced in this paper to the point of requiring no manual adaptation to any of the BioNLP'13 tasks. Further, the scikit-learn machine learning library is integrated into the system, bringing a wide variety of machine learning methods usable with TEES in addition to the default SVM. A scikit-learn ensemble method is also used to analyze the importances of the features in the TEES feature sets.

Conclusions: The TEES system was introduced for the BioNLP'09 Shared Task and has since then demonstrated good performance in several other shared tasks. By applying the current TEES 2.2 system to multiple corpora from these past shared tasks an overarching analysis of the most promising methods and possible pitfalls in the evolving field of biomedical event extraction are presented.

PubMed Disclaimer

Figures

**Figure 1**
**The TEES event extraction process**. Preprocessing steps A-C can be omitted in the BioNLP Shared Tasks as corresponding data is provided by the organizers. The event extraction steps D-F are all independent SVM classification steps, with the trigger and edge detection steps being linked together by the recall adjustment parameter. (Figure adapted from Björne et. al [5].)

**Figure 2**
**Multiple approaches (A and B) were used in the BioNLP'11 Shared Task for representing site-arguments in the TEES graph format**. In TEES 2.0 these representations have been merged into the unified representation (C), allowing site-arguments to be processed like any other event arguments.

**Figure 3**
**The visualizer provided with TEES 2**.2 can be used to display both the event annotation as well as the parse of a sentence. This figure shows sentence GE13.d216.s0, taken from the BioNLP 2013 GENIA development corpus document PMC-3333881-20-Caption-Figure 3, demonstrating a nested event structure consisting of two *Negative regulation* events.

**Figure 4**
**The performance of systems that took part in the BioNLP'13 Shared Task**. The TEES results are shown with black crosses. Please note that in tasks GRN and BBT1 the metric is SER*100 where a smaller score is better.

**Figure 5**
**Examples for the feature groups in Figure 6 and Table 6**. The numbered dependencies and tokens indicate the *linear* and *dependency* context for the token "phosphorylation". The dotted Theme edge and its corresponding dependency indicate the shortest path of an event argument edge. The example features correspond to the "phosphorylation" entity and the dotted edge. The token features TOK(x) are incorporated into the more complex features. (Figure adapted from [13].)

**Figure 6**
**The distribution of feature importances for feature groups, for each of the four classification steps (*trigger, edge, unmerging* and *modifier* )**. The *deps* group refers to *dependencies*. In the box plots the boxes contain the features from the lower to upper quartiles, with a red line at the median. The dotted-line whiskers extend to 1.5 times the interquartile range and the outlier points are shown as individual markers. See Figure 5 for feature group details.

See this image and copyright information in PMC

Cited by

A biomedical event extraction method based on fine-grained and attention mechanism.
He X, Tai P, Lu H, Huang X, Ren Y. He X, et al. BMC Bioinformatics. 2022 Jul 29;23(1):308. doi: 10.1186/s12859-022-04854-0. BMC Bioinformatics. 2022. PMID: 35906547 Free PMC article.
Biomedical event extraction with a novel combination strategy based on hybrid deep neural networks.
Zhu L, Zheng H. Zhu L, et al. BMC Bioinformatics. 2020 Feb 6;21(1):47. doi: 10.1186/s12859-020-3376-2. BMC Bioinformatics. 2020. PMID: 32028883 Free PMC article.
Extraction of chemical-protein interactions from the literature using neural networks and narrow instance representation.
Antunes R, Matos S. Antunes R, et al. Database (Oxford). 2019 Jan 1;2019:baz095. doi: 10.1093/database/baz095. Database (Oxford). 2019. PMID: 31622463 Free PMC article.
COPIOUS: A gold standard corpus of named entities towards extracting species occurrence from biodiversity literature.
Nguyen NTH, Gabud RS, Ananiadou S. Nguyen NTH, et al. Biodivers Data J. 2019 Jan 22;(7):e29626. doi: 10.3897/BDJ.7.e29626. eCollection 2019. Biodivers Data J. 2019. PMID: 30700967 Free PMC article.
Annotation and detection of drug effects in text for pharmacovigilance.
Thompson P, Daikou S, Ueno K, Batista-Navarro R, Tsujii J, Ananiadou S. Thompson P, et al. J Cheminform. 2018 Aug 13;10(1):37. doi: 10.1186/s13321-018-0290-y. J Cheminform. 2018. PMID: 30105604 Free PMC article.

See all "Cited by" articles

References

1. Kim JD, Ohta T, Pyysalo S, Kano Y, Tsujii J. Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task. ACL, Boulder, Colorado; 2009. Overview of BioNLP'09 Shared Task on Event Extraction; pp. 1–9.
1. Kim JD, Pyysalo S, Ohta T, Bossy R, Tsujii J. Proceedings of the BioNLP 2011 Workshop Companion Volume for Shared Task. Association for Computational Linguistics, Portland, Oregon; 2011. Overview of BioNLP Shared Task 2011.
1. Nédellec C, Bossy R, Kim JD, Kim JJ, Ohta T, Pyysalo S, Zweigenbaum P. Proceedings of the BioNLP Shared Task 2013 Workshop. Association for Computational Linguistics, Sofia, Bulgaria; 2013. Overview of bionlp shared task 2013; pp. 1–7.
1. Björne J, Heimonen J, Ginter F, Airola A, Pahikkala T, Salakoski T. Extracting Contextualized Complex Biological Events with Rich Graph-Based Feature Sets. Computational Intelligence, Special issue on Extracting Bio-molecular Events from Literature. 2011. Accepted in 2009.
1. Björne J, Ginter F, Salakoski T. University of Turku in the BioNLP'11 Shared Task. BMC Bioinformatics. 2012;13(Suppl 11):4. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

[1] Kim JD, Ohta T, Pyysalo S, Kano Y, Tsujii J. Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task. ACL, Boulder, Colorado; 2009. Overview of BioNLP'09 Shared Task on Event Extraction; pp. 1–9.

[2] Kim JD, Ohta T, Pyysalo S, Kano Y, Tsujii J. Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task. ACL, Boulder, Colorado; 2009. Overview of BioNLP'09 Shared Task on Event Extraction; pp. 1–9.

[3] Kim JD, Pyysalo S, Ohta T, Bossy R, Tsujii J. Proceedings of the BioNLP 2011 Workshop Companion Volume for Shared Task. Association for Computational Linguistics, Portland, Oregon; 2011. Overview of BioNLP Shared Task 2011.

[4] Kim JD, Pyysalo S, Ohta T, Bossy R, Tsujii J. Proceedings of the BioNLP 2011 Workshop Companion Volume for Shared Task. Association for Computational Linguistics, Portland, Oregon; 2011. Overview of BioNLP Shared Task 2011.

[5] Nédellec C, Bossy R, Kim JD, Kim JJ, Ohta T, Pyysalo S, Zweigenbaum P. Proceedings of the BioNLP Shared Task 2013 Workshop. Association for Computational Linguistics, Sofia, Bulgaria; 2013. Overview of bionlp shared task 2013; pp. 1–7.

[6] Nédellec C, Bossy R, Kim JD, Kim JJ, Ohta T, Pyysalo S, Zweigenbaum P. Proceedings of the BioNLP Shared Task 2013 Workshop. Association for Computational Linguistics, Sofia, Bulgaria; 2013. Overview of bionlp shared task 2013; pp. 1–7.

[7] Björne J, Heimonen J, Ginter F, Airola A, Pahikkala T, Salakoski T. Extracting Contextualized Complex Biological Events with Rich Graph-Based Feature Sets. Computational Intelligence, Special issue on Extracting Bio-molecular Events from Literature. 2011. Accepted in 2009.

[8] Björne J, Heimonen J, Ginter F, Airola A, Pahikkala T, Salakoski T. Extracting Contextualized Complex Biological Events with Rich Graph-Based Feature Sets. Computational Intelligence, Special issue on Extracting Bio-molecular Events from Literature. 2011. Accepted in 2009.

[9] Björne J, Ginter F, Salakoski T. University of Turku in the BioNLP'11 Shared Task. BMC Bioinformatics. 2012;13(Suppl 11):4. - PMC - PubMed

[10] Björne J, Ginter F, Salakoski T. University of Turku in the BioNLP'11 Shared Task. BMC Bioinformatics. 2012;13(Suppl 11):4. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

TEES 2.2: Biomedical Event Extraction for Diverse Corpora

TEES 2.2: Biomedical Event Extraction for Diverse Corpora

Authors

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Miscellaneous