AUTHOR=Zaslavsky Leonid , Cheng Tiejun , Gindulyte Asta , He Siqian , Kim Sunghwan , Li Qingliang , Thiessen Paul , Yu Bo , Bolton Evan E. TITLE=Discovering and Summarizing Relationships Between Chemicals, Genes, Proteins, and Diseases in PubChem JOURNAL=Frontiers in Research Metrics and Analytics VOLUME=6 YEAR=2021 URL=https://www.frontiersin.org/journals/research-metrics-and-analytics/articles/10.3389/frma.2021.689059 DOI=10.3389/frma.2021.689059 ISSN=2504-0537 ABSTRACT=

The literature knowledge panels developed and implemented in PubChem are described. These help to uncover and summarize important relationships between chemicals, genes, proteins, and diseases by analyzing co-occurrences of terms in biomedical literature abstracts. Named entities in PubMed records are matched with chemical names in PubChem, disease names in Medical Subject Headings (MeSH), and gene/protein names in popular gene/protein information resources, and the most closely related entities are identified using statistical analysis and relevance-based sampling. Knowledge panels for the co-occurrence of chemical, disease, and gene/protein entities are included in PubChem Compound, Protein, and Gene pages, summarizing these in a compact form. Statistical methods for removing redundancy and estimating relevance scores are discussed, along with benefits and pitfalls of relying on automated (i.e., not human-curated) methods operating on data from multiple heterogeneous sources.