AUTHOR=Meyers Adam L. , He Yifan , Glass Zachary , Ortega John , Liao Shasha , Grieve-Smith Angus , Grishman Ralph , Babko-Malaya Olga TITLE=The Termolator: Terminology Recognition Based on Chunking, Statistical and Search-Based Scores JOURNAL=Frontiers in Research Metrics and Analytics VOLUME=3 YEAR=2018 URL=https://www.frontiersin.org/journals/research-metrics-and-analytics/articles/10.3389/frma.2018.00019 DOI=10.3389/frma.2018.00019 ISSN=2504-0537 ABSTRACT=

The Termolator is an open-source high-performing terminology extraction system, available on Github. The Termolator combines several different approaches to get superior coverage and precision. The in-line term component identifies potential instances of terminology using a chunking procedure, similar to noun group chunking, but favoring chunks that contain out-of-vocabulary words, nominalizations, technical adjectives, and other specialized word classes. The distributional component ranks such term chunks according to several metrics including: (a) a set of metrics that favors term chunks that are relatively more frequent in a “foreground” corpus about a single topic than they are in a “background” or multi-topic corpus; (b) a well-formedness score based on linguistic features; and (c) a relevance score which measures how often terms appear in articles and patents in a Yahoo web search. We analyse the contributions made by each of these components and show that all modules contribute to the system's performance, both in terms of the number and quality of terms identified. This paper expands upon previous publications about this research and includes descriptions of some of the improvements made since its initial release. This study also includes a comparison with another terminology extraction system available on-line, Termostat (Drouin, 2003). We found that the systems get comparable results when applied to small amounts of data: about 50% precision for a single foreground file (Einstein's Theory of Relativity). However, when running the system with 500 patent files as foreground, Termolator performed significantly better than Termostat. For 500 refrigeration patents, Termolator got 70% precision vs. Termostat's 52%. For 500 semiconductor patents, Termolator got 79% precision vs. Termostat's 51%.