About this Research Topic
The processing of scientific writing, which includes the analysis of citation contexts but also information extraction from scientific papers for various applications, has been the object of intensive research during the last decade. This has become possible thanks to two factors. The first one is the growing availability of scientific papers in full text and in machine-readable formats as well as the rise of Open Access publishing on online platforms such as ArXiv, CiteSeer or PloS. The second one is the relative maturity of open source tools and libraries for natural language processing that facilitate text processing (e.g. NLTK, Mallet, OpenNLP, CoreNLP, Gate, CiteSpace). As a result, a large number of studies has been dedicated to citation context analysis, but also summarization and recommendation of scientific papers.
This Research Topic aims to discuss novel approaches that focus on the processing and exploitation of data extracted from scientific literature. In particular, the possibility to enrich metadata by the full-text processing of papers offers new fields of investigation that are related to the representation of data and the production of knowledge by the aggregation of data from multiple documents. Given the wide range of available techniques, several questions arise in this field: What volume of scientific data should be considered exploitable and allow the production of new knowledge through aggregation? How can knowledge generated from data in scientific articles be represented? What types of data and knowledge can be automatically extracted from scientific articles and how can it be exploited efficiently?
We also invite papers (e.g. Brief Research Reports, Data Reports, Methods, Opinions, Original Research, ...) produced by participants of the recently launched COVID-19 Open Research Dataset Challenge (CORD-19). Teams who work on the CORD-19 dataset and address tasks which relate to this RT are welcome to approach the organizers and submit.
The objective of this Research Topic is to share interdisciplinary techniques around the theme of text mining applied to scientific articles. The spectrum of the topic includes computational linguistics as well as machine learning approaches to enrich the content of scientific articles and to facilitate the exploitation of their data. It also covers rule-based techniques, the implementation of grammars and artificial intelligence, and methods of improving the way large-scale text analysis, text mining, and sense mining of scientific articles can benefit from these techniques.
Topics under the research theme include, but are not limited to:
• Information extraction, text mining and parsing of scholarly literature
• Datasets for mining scientific papers
• Natural Language Processing (NLP) applied to citation analysis, recommendation and classification
• Discourse modeling and argument mining
• Methodology and models for content citation analysis
• Semantic and network-based indexing, search and navigation in structured text
• Knowledge discovery and visualization
• Information seeking behavior and human-computer interaction in academic search
• Scientific document engineering
• Data exploitation and information extraction from scientific articles
• The emergence of research questions in text processing for bibliometrics purposes.
Keywords: Text Mining, Information Retrieval, Natural Language Processing, Academic Search, Citation Contexts
Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.