Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015;16 Suppl 16(Suppl 16):S4.
doi: 10.1186/1471-2105-16-S16-S4. Epub 2015 Oct 30.

TEES 2.2: Biomedical Event Extraction for Diverse Corpora

TEES 2.2: Biomedical Event Extraction for Diverse Corpora

Jari Björne et al. BMC Bioinformatics. 2015.

Abstract

Background: The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency parsing. The TEES system has achieved record performance in several of the shared tasks of its domain, and continues to be used in a variety of biomedical text mining tasks.

Results: The TEES system was quickly adapted to the BioNLP'13 Shared Task in order to provide a public baseline for derived systems. An automated approach was developed for learning the underlying annotation rules of event type, allowing immediate adaptation to the various subtasks, and leading to a first place in four out of eight tasks. The system for the automated learning of annotation rules is further enhanced in this paper to the point of requiring no manual adaptation to any of the BioNLP'13 tasks. Further, the scikit-learn machine learning library is integrated into the system, bringing a wide variety of machine learning methods usable with TEES in addition to the default SVM. A scikit-learn ensemble method is also used to analyze the importances of the features in the TEES feature sets.

Conclusions: The TEES system was introduced for the BioNLP'09 Shared Task and has since then demonstrated good performance in several other shared tasks. By applying the current TEES 2.2 system to multiple corpora from these past shared tasks an overarching analysis of the most promising methods and possible pitfalls in the evolving field of biomedical event extraction are presented.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The TEES event extraction process. Preprocessing steps A-C can be omitted in the BioNLP Shared Tasks as corresponding data is provided by the organizers. The event extraction steps D-F are all independent SVM classification steps, with the trigger and edge detection steps being linked together by the recall adjustment parameter. (Figure adapted from Björne et. al [5].)
Figure 2
Figure 2
Multiple approaches (A and B) were used in the BioNLP'11 Shared Task for representing site-arguments in the TEES graph format. In TEES 2.0 these representations have been merged into the unified representation (C), allowing site-arguments to be processed like any other event arguments.
Figure 3
Figure 3
The visualizer provided with TEES 2.2 can be used to display both the event annotation as well as the parse of a sentence. This figure shows sentence GE13.d216.s0, taken from the BioNLP 2013 GENIA development corpus document PMC-3333881-20-Caption-Figure 3, demonstrating a nested event structure consisting of two Negative regulation events.
Figure 4
Figure 4
The performance of systems that took part in the BioNLP'13 Shared Task. The TEES results are shown with black crosses. Please note that in tasks GRN and BBT1 the metric is SER*100 where a smaller score is better.
Figure 5
Figure 5
Examples for the feature groups in Figure 6 and Table 6. The numbered dependencies and tokens indicate the linear and dependency context for the token "phosphorylation". The dotted Theme edge and its corresponding dependency indicate the shortest path of an event argument edge. The example features correspond to the "phosphorylation" entity and the dotted edge. The token features TOK(x) are incorporated into the more complex features. (Figure adapted from [13].)
Figure 6
Figure 6
The distribution of feature importances for feature groups, for each of the four classification steps (trigger, edge, unmerging and modifier ). The deps group refers to dependencies. In the box plots the boxes contain the features from the lower to upper quartiles, with a red line at the median. The dotted-line whiskers extend to 1.5 times the interquartile range and the outlier points are shown as individual markers. See Figure 5 for feature group details.

Similar articles

Cited by

References

    1. Kim JD, Ohta T, Pyysalo S, Kano Y, Tsujii J. Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task. ACL, Boulder, Colorado; 2009. Overview of BioNLP'09 Shared Task on Event Extraction; pp. 1–9.
    1. Kim JD, Pyysalo S, Ohta T, Bossy R, Tsujii J. Proceedings of the BioNLP 2011 Workshop Companion Volume for Shared Task. Association for Computational Linguistics, Portland, Oregon; 2011. Overview of BioNLP Shared Task 2011.
    1. Nédellec C, Bossy R, Kim JD, Kim JJ, Ohta T, Pyysalo S, Zweigenbaum P. Proceedings of the BioNLP Shared Task 2013 Workshop. Association for Computational Linguistics, Sofia, Bulgaria; 2013. Overview of bionlp shared task 2013; pp. 1–7.
    1. Björne J, Heimonen J, Ginter F, Airola A, Pahikkala T, Salakoski T. Extracting Contextualized Complex Biological Events with Rich Graph-Based Feature Sets. Computational Intelligence, Special issue on Extracting Bio-molecular Events from Literature. 2011. Accepted in 2009.
    1. Björne J, Ginter F, Salakoski T. University of Turku in the BioNLP'11 Shared Task. BMC Bioinformatics. 2012;13(Suppl 11):4. - PMC - PubMed

Publication types

LinkOut - more resources

-