Skip to main content

ORIGINAL RESEARCH article

Front. Psychiatry
Sec. Schizophrenia
Volume 16 - 2025 | doi: 10.3389/fpsyt.2025.1518762
This article is part of the Research Topic Machine Learning Algorithms and Software Tools for Early Detection and Prognosis of Schizophrenia View all articles

Deep Multimodal Representations and Classification of First-Episode Psychosis via Live Face Processing

Provisionally accepted
  • 1 Yale University, New Haven, United States
  • 2 Montreal Institute for Learning Algorithm (MILA), Montreal, Quebec, Canada
  • 3 Brain Function Laboratory, Department of Psychiatry, Yale University, New Haven, United States

The final, formatted version of the article will be published soon.

    Schizophrenia is a severe psychiatric disorder associated with a wide range of cognitive and neurophysiological dysfunctions and long-term social difficulties. Early detection is expected to reduce the burden of disease by initiating early treatment. In this paper, we test the hypothesis that integration of multiple simultaneous acquisitions of neuroimaging, behavioral, and clinical information will be better for prediction of early psychosis than unimodal recordings. We propose a novel framework to investigate the neural underpinnings of the early psychosis symptoms (that can develop into Schizophrenia with age) using multimodal acquisitions of neural and behavioral recordings including functional near-infrared spectroscopy (fNIRS) and electroencephalography (EEG), and facial features. Our data acquisition paradigm is based on live face-to-face interaction in order to study the neural correlates of social cognition in first-episode psychosis (FEP). We propose a novel deep representation learning framework, Neural-PRISM, for learning joint multimodal compressed representations combining neural as well as behavioral recordings. These learned representations are subsequently used to describe, classify, and predict the severity of early psychosis in patients, as measured by the Positive and Negative Syndrome Scale (PANSS) and Global Assessment of Functioning (GAF) scores to evaluate the impact of symptomatology. We found that incorporating joint multimodal representations from fNIRS and EEG along with behavioral recordings enhances classification between typical controls and FEP individuals (significant improvements between $10-20\%$). Additionally, our results suggest that geometric and topological features such as curvatures and path signatures of the embedded trajectories of brain activity enable detection of discriminatory neural characteristics in early psychosis.

    Keywords: RNN - recurrent neural network, face processing, multimodal representation, Path signature feature, representation learning, First episode psychosis (FEP)

    Received: 28 Oct 2024; Accepted: 21 Jan 2025.

    Copyright: © 2025 Singh, Zhang, Bhaskar, Srihari, Tek, Zhang, Noah, Krishnaswamy and Hirsch. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Dhananjay Bhaskar, Yale University, New Haven, United States
    Xian Zhang, Brain Function Laboratory, Department of Psychiatry, Yale University, New Haven, United States
    Smita Krishnaswamy, Yale University, New Haven, United States
    Joy Hirsch, Yale University, New Haven, United States

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.