Knowledge Discovery from Unstructured Data in Finance

Editors

Xiaomo Liu

J.P. Morgan AI Research

Mohammad Mahdi Ghassemi

Michigan State University

Yuheng Hu

University of Illinois Chicago

Sameena Shah

JPMorgan Chase & Co (United States)

Impact

(A) Selecting optimal k topics by evaluating held-out likelihood with adjusting the number of topics from 2 to 60. (B) The correlation of 26 identified topics, exhibiting four connected clusters with relatively high marginal correlation, together with a few isolated topics.

Original Research

06 October 2022

Understanding heterogeneity of investor sentiment on social media: A structural topic modeling approach

Rongjiao Ji

and

Qiwei Han

Investors nowadays post heterogeneous sentiments on social media about financial assets based on their trading preferences. However, existing works typically analyze the sentiment by its content only and do not account for investor profiles and trading preferences in different types of assets. This paper explicitly considers how investor sentiment about financial market events is shaped by the relative discussions of different types of investors. We leverage a large-scale financial social media dataset and employ a structural topic modeling approach to extract topical contents of investor sentiment across multiple finance-specific factors. The identified topics reveal important events related to the financial market and show strong heterogeneity in the social media content in terms of compositions of investor profiles, asset categories, and bullish/bearish sentiment. Results show that investors with different profiles and trading preferences tend to discuss financial markets with heterogeneous beliefs, leading to divergent opinions about those events regarding the topic prevalence and proportion. Moreover, our findings may shed light on the mechanism that underlies the efficient investor sentiment extraction and aggregation while considering the heterogeneity of investor sentiment across different dimensions.

4,664 views

7 citations

Original Research

07 June 2022

Forecasting Stock Price Trends by Analyzing Economic Reports With Analyst Profiles

Masahiro Suzuki

, 2 more and

Yasushi Ishikawa

This article proposes a methodology to forecast the movements of analysts' estimated net income and stock prices using analyst profiles. Our methodology is based on applying natural language processing and neural networks in the context of analyst reports. First, we apply the proposed method to extract opinion sentences from the analyst report while classifying the remaining parts as non-opinion sentences. Then, we employ the proposed method to forecast the movements of analysts' estimated net income and stock price by inputting the opinion and non-opinion sentences into separate neural networks. In addition to analyst reports, we input analyst profiles to the networks. As analyst profiles, we used the name of an analyst, the securities company to which the analyst belongs, the sector which the analyst covers, and the analyst ranking. Consequently, we obtain an indication that the analyst profile effectively improves the model forecasts. However, classifying analyst reports into opinion and non-opinion sentences is insignificant for the forecasts.

4,506 views

3 citations

High-level system architecture for dichotomic pattern mining, embedded with sequence-to-pattern generation, as an integration technology between raw sequential data, e.g., clickstream, pattern analysis, pattern-to-feature generation, and machine learning models for downstream prediction tasks.

Original Research

12 July 2022

Dichotomic Pattern Mining Integrated With Constraint Reasoning for Digital Behavior Analysis

Sohom Ghosh

, 3 more and

Serdar Kadıoğlu

3,620 views

2 citations

Original Research

03 May 2022

Credit Risk Modeling Using Transfer Learning and Domain Adaptation

Hendra Suryanto

, 3 more and

Ada Guan

In the domain of credit risk assessment lenders may have limited or no data on the historical lending outcomes of credit applicants. Typically this disproportionately affects Micro, Small, and Medium Enterprises (MSMEs), for which credit may be restricted or too costly, due to the difficulty of predicting the Probability of Default (PD). However, if data from other related credit risk domains is available Transfer Learning may be applied to successfully train models, e.g., from the credit card lending and debt consolidation (CD) domains to predict in the small business lending domain. In this article, we report successful results from an approach using transfer learning to predict the probability of default based on the novel concept of Progressive Shift Contribution (PSC) from source to target domain. Toward real-world application by lenders of this approach, we further address two key questions. The first is to explain transfer learning models, and the second is to adjust features when the source and target domains differ. To address the first question, we apply Shapley values to investigate how and why transfer learning improves model accuracy, and also propose and test a domain adaptation approach to address the second. These results show that adaptation improves model accuracy in addition to the improvement from transfer learning. We extend this by proposing and testing a combined strategy of feature selection and adaptation to convert values of source domain features to better approximate values of target domain features. Our approach includes a strategy to choose features for adaptation and an algorithm to adapt the values of these features. In this setting, transfer learning appears to improve model accuracy by increasing the contribution of less predictive features. Although the percentage improvements are small, such improvements in real world lending could be of significant economic importance.

5,464 views

4 citations

Original Research

18 May 2022

Constructing Equity Investment Strategies Using Analyst Reports and Regime Switching Models

Rei Taguchi

, 3 more and

Kenji Hiramatsu

2,275 views

2 citations