AUTHOR=Poudyal Anubhuti , van Heerden Alastair , Hagaman Ashley , Islam Celia , Thapa Ada , Maharjan Sujen Man , Byanjankar Prabin , Kohrt Brandon A. TITLE=What Does Social Support Sound Like? Challenges and Opportunities for Using Passive Episodic Audio Collection to Assess the Social Environment JOURNAL=Frontiers in Public Health VOLUME=9 YEAR=2021 URL=https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2021.633606 DOI=10.3389/fpubh.2021.633606 ISSN=2296-2565 ABSTRACT=

Background: The social environment, comprised of social support, social burden, and quality of interactions, influences a range of health outcomes, including mental health. Passive audio data collection on mobile phones (e.g., episodic recording of the auditory environment without requiring any active input from the phone user) enables new opportunities to understand the social environment. We evaluated the use of passive audio collection on mobile phones as a window into the social environment while conducting a study of mental health among adolescent and young mothers in Nepal.

Methods: We enrolled 23 adolescent and young mothers who first participated in qualitative interviews to describe their social support and identify sounds potentially associated with that support. Then, episodic recordings were collected for 2 weeks from the mothers using an app to record 30 s of audio every 15 min from 4 A.M. to 9 P.M. Audio data were processed and classified using a pretrained model. Each classification category was accompanied by an estimated accuracy score. Manual validation of the machine-predicted speech and non-speech categories was done for accuracy.

Results: In qualitative interviews, mothers described a range of positive and negative social interactions and the sounds that accompanied these. Potential positive sounds included adult speech and laughter, infant babbling and laughter, and sounds from baby toys. Sounds characterizing negative stimuli included yelling, crying, screaming by adults and crying by infants. Sounds associated with social isolation included silence and TV or radio noises. Speech comprised 43% of all passively recorded audio clips (n = 7,725). Manual validation showed a 23% false positive rate and 62% false-negative rate for speech, demonstrating potential underestimation of speech exposure. Other common sounds were music and vehicular noises.

Conclusions: Passively capturing audio has the potential to improve understanding of the social environment. However, a pre-trained model had the limited accuracy for identifying speech and lacked categories allowing distinction between positive and negative social interactions. To improve the contribution of passive audio collection to understanding the social environment, future work should improve the accuracy of audio categorization, code for constellations of sounds, and combine audio with other smartphone data collection such as location and activity.