Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 18;21(1):352.
doi: 10.1186/s12911-021-01706-4.

MedTAG: a portable and customizable annotation tool for biomedical documents

Affiliations

MedTAG: a portable and customizable annotation tool for biomedical documents

Fabio Giachelle et al. BMC Med Inform Decis Mak. .

Abstract

Background: Semantic annotators and Natural Language Processing (NLP) methods for Named Entity Recognition and Linking (NER+L) require plenty of training and test data, especially in the biomedical domain. Despite the abundance of unstructured biomedical data, the lack of richly annotated biomedical datasets poses hindrances to the further development of NER+L algorithms for any effective secondary use. In addition, manual annotation of biomedical documents performed by physicians and experts is a costly and time-consuming task. To support, organize and speed up the annotation process, we introduce MedTAG, a collaborative biomedical annotation tool that is open-source, platform-independent, and free to use/distribute.

Results: We present the main features of MedTAG and how it has been employed in the histopathology domain by physicians and experts to annotate more than seven thousand clinical reports manually. We compare MedTAG with a set of well-established biomedical annotation tools, including BioQRator, ezTag, MyMiner, and tagtog, comparing their pros and cons with those of MedTag. We highlight that MedTAG is one of the very few open-source tools provided with an open license and a straightforward installation procedure supporting cross-platform use.

Conclusions: MedTAG has been designed according to five requirements (i.e. available, distributable, installable, workable and schematic) defined in a recent extensive review of manual annotation tools. Moreover, MedTAG satisfies 20 over 22 criteria specified in the same study.

Keywords: Biomedical annotation tools; Digital health; Entity extraction; eHealth.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Overview of annotation tools and their functionalities. The annotation tools considered come from a recent extensive review of tools for manual annotation of documents [34]. In addition, we consider also TeamTat [35] and INCEpTION [36] and report our judgements. The annotation tools are assessed with 22 criteria, defined in the latter review study, among three categories: Data (D), Functional (F) and Technical (T). The fulfillment of each criterion is indicated with a color in a three levels scale: white (feature absent or not met), light blue (feature partially satisfied), blue (feature satisfied)
Fig. 2
Fig. 2
MedTAG Architecture. The data layer comprises two relational databases, namely, MedTAG data and Log data to store all the information concerning the annotation process (e.g., concepts, labels, reports, users and their annotations) and logging data such as notifications of malformatted clinical reports. The business layer comprises two business units: Business logic and REST API which jointly control the whole information flow from the front-end to the database and vice-versa. The presentation layer provides the MedTAG front-end, a web interface allowing users to annotate medical reports and download their ground truths
Fig. 3
Fig. 3
MedTAG sidebar provides the Configure option, indicated by the orange arrow, to set up a new custom configuration
Fig. 4
Fig. 4
MedTAG new configuration interface allows the user to save the current data before creating a new configuration. To guide the user in providing the new configuration files needed (i.e. reports/documents, labels and concepts), MedTAG provides both example and template files. In particular, users can use the example files to test MedTAG without providing their own data. Instead, users can use the template files as a reference to structure their own configuration files
Fig. 5
Fig. 5
MedTAG main interface for data configuration. Users can provide their own CSV files for the reports/documents to annotate and the concepts and labels to use for the annotation process. Moreover, MedTAG detects automatically the document fields and allows users to specify which of them to annotate and/or display in the interface, as shown in the orange box (1)
Fig. 6
Fig. 6
MedTAG main interface in test mode with default configuration: clinical case set to “Colon cancer”, reports’ language set to English, reports’ institute/hospital set to “default_hospital” (the real name has been anonymized) and the annotation mode set to manual. The annotation type active is the Labels one. Three labels have been checked: (i) Cancer; (ii) Adenomatous polyp - low grade dysplasia and (iii) Hyperplastic polyp
Fig. 7
Fig. 7
MedTAG main interface in test mode with default configuration: clinical case set to “Colon cancer”, reports’ language set to English, reports’ institute/hospital set to “default_hospital” (the real name has been anonymized) and the annotation mode set to manual. The annotation type active is the Linking one. Three mentions have been identified and linked to the corresponding concepts: (i) hyperplastic adenomatous polyp is linked to Colon Hyperplastic Polyp; (ii) mild dysplasia is linked to Mild Colon Dysplasia; and (iii) tubular adenoma is linked to Colon Tubular Adenoma
Fig. 8
Fig. 8
MedTAG tutorial interface. To reach the tutorial section, users can click on the Tutorial link in the sidebar, indicated by the orange arrow
Fig. 9
Fig. 9
MedTAG control panel concerning the reports’ statistics. The reports are organized in an interactive table enabling the admin user to: (i) access report data; (ii) delete one or more reports; (iii) download report data including manual and automatic annotations and (iv) access the information concerning IAA and manage the majority vote procedure
Fig. 10
Fig. 10
MedTAG control panel concerning the team members’ statistics. The ring charts report the annotation work carried out by each team member, so that the admin can keep track of the advancements regarding the whole annotation process
Fig. 11
Fig. 11
MedTAG My Statistics panel, providing information about the user annotation work in terms of documents annotated for each use-case
Fig. 12
Fig. 12
MedTAG majority vote interface. The admin can overview the selected report and choose the options of interest for the majority vote procedure, including: (i) the annotation mode; (ii) the annotation type and (iii) the team members (annotators) to consider
Fig. 13
Fig. 13
MedTAG majority vote output for the Labels annotation type. The admin can visualize the annotations resulting from the majority vote procedure, together with the corresponding authors. In addition, the admin can download the annotations or change the current majority vote configuration

Similar articles

Cited by

References

    1. Murdoch TB, Detsky AS. The inevitable application of big data to health care. JAMA. 2013;309(13):1351–1352. doi: 10.1001/jama.2013.393. - DOI - PubMed
    1. Aronson AR, Lang FM. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010;17(3):229–236. doi: 10.1136/jamia.2009.002733. - DOI - PMC - PubMed
    1. Gorrell G, Song X, Roberts A. Bio-yodie: A named entity linking system for biomedical text. arXiv preprint arXiv:181104860. 2018;.
    1. Wu H, Toti G, Morley KI, Ibrahim ZM, Folarin A, Jackson R, et al. SemEHR: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research. J Am Med Inform Assoc. 2018;25(5):530–537. doi: 10.1093/jamia/ocx160. - DOI - PMC - PubMed
    1. Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, et al. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17(5):507–513. doi: 10.1136/jamia.2009.001560. - DOI - PMC - PubMed

Publication types

LinkOut - more resources

-