- 1Department of Otolaryngology—Head and Neck Surgery, Columbia University Irving Medical Center/NewYork-Presbyterian Hospital, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, United States
- 2Department of Mechanical Engineering, The Fu Foundation School of Engineering and Applied Science, Columbia University, New York, NY, United States
- 3Department of Otolaryngology—Head and Neck Surgery, Weill Cornell Medical College and NewYork-Presbyterian Hospital, New York, NY, United States
Objective: Although speech recognition among cochlear implant (CI) users improved over the past few decades, many still report poor speech quality. Currently, there is no validated tool to measure speech quality. The objective was to examine whether a previously validated speech quality tool is applicable in the CI population using psychometric analysis.
Design: Cross-sectional psychometric analysis of the Columbia Speech Quality Instrument (CSQI; previously validated in normal-hearing individuals; consists of 2 original and 7 manipulated speech clips designed to accentuate selected speech characteristics) was performed in adult English-speaking CI recipients (N = 36). Subjects rated each clip using a visual analog scale (VAS) on 14 characteristics: cartoonish/not-cartoonish, clear/unclear, like/dislike, breathy/not-breathy, smooth/rough, echo-y/not-echo-y, tinny/bassy, soothing/not-soothing, natural/unnatural, mechanical/not-mechanical, hoarse/smooth, pleasant/unpleasant, male/female, and speech-like/not-speech-like. Main outcome measures included validity, reliability, and factor structure.
Results: Content validity was previously confirmed during instrument design. Construct validity by item-item correlation analysis demonstrated correlation of 12/14 items with ≥1 other retained item (r ≥ 0.35, Spearman). Reliability was confirmed by internal consistency; factor analysis using two subsets selected by Scree plot and factor loading ≥0.4 demonstrated Cronbach alpha coefficients of 0.89 and 0.74 for factors 1 and 2, respectively. Tinny/bassy and male/female were the only characteristics that did not pass construct validity or internal consistency.
Conclusions: The CSQI has strong psychometric properties in the CI population; however, our findings support removal of tinny/bass and male/female characteristics from the final instrument prior to implementation in the CI population. The CSQI can be utilized in cochlear implantees to investigate effects of changes in speech processing strategies and postoperative outcomes with different devices.
Introduction
Speech recognition among cochlear implant (CI) users has improved drastically in the past few decades due to improvements in hardware, software, and surgical techniques (Zwolan, 2008). Despite advances in speech recognition, many patients anecdotally report that speech quality heard through CIs remains odd and unpleasant. Although individual experiences widely vary, sounds are often described as mechanical or cartoon-like by patients. Beyond linguistic content, human speech encodes information about the speaker's age, gender, identity, accent, and emotional state, which are critical for social interactions and may be lost when speech quality is inadequate. CI users are known to struggle with gender identification (Fu et al., 2005), speaker identification (Vongphoe and Zeng, 2005), and emotion recognition (Luo et al., 2007) compared to their normal hearing peers.
To improve speech quality heard by CI recipients, a standardized method of defining and scoring speech quality is necessary to track changes among different techniques. Previous validated tools that incorporate perceived sound quality as a metric include the Hearing Implant Sound Quality Index (HISQUI19) (Amann and Anderson, 2014) and the Speech, Spatial, and Qualities of Hearing Scale (SSQ) (Gatehouse and Noble, 2004). While the HISQUI19 and SSQ are excellent at measuring the impacts of hearing loss and cochlear implantation on everyday activities and QOL, they do not investigate which specific characteristics of speech sound unnatural or assess how CI users describe the quality of speech they are hearing.
Our group has developed the first tool to assess speech quality and its pleasantness. The Columbia Speech Quality Instrument (CSQI) is a concise, interactive, computerized test that consists of nine speech clips manipulated to clearly portray speech qualities of interest as defined by normal hearing individuals. Participants quantify the quality of perceived speech across 14 characteristics. The CSQI was generated by a focus group of otolaryngologists, audiologists, speech pathologists with extensive experience with patients with hearing loss and CI users and previously administered to normal hearing participants for development of the initial item bank and subsequent finalized speech quality instrument, which underwent validity and reliability analyses (Chen et al., 2018). In this study, we aimed to determine whether this validated speech quality tool is applicable in the CI population using psychometric analysis to examine the validity, reliability, and factor structure of the CSQI among CI listeners; the CSQI will be useful in optimizing speech quality in cochlear implantees by quantifiably measuring changes in speech quality scores across speech processing strategies and CIs.
Materials/methods
Recruitment and study design
We partnered with an experienced sound/audio engineer and a full stack web developer to develop a novel web-based application based upon specifications of our prior data (Peter Karl Studios, New York, NY; WYC Technologies, New York, NY). Subjects were recruited from the Columbia University Medical Center CI program and from web-hosted prominent CI support groups. Eligibility criteria included age > 18 years, bilateral or unilateral cochlear implantation status, a minimum of 6 months since implant activation, and English literacy. Subjects had the option to complete the study in our clinic or to complete the study online at home. Subjects completing the study online at home had the option of sending their audiogram in a de-identified fashion.
All subjects were e-consented prior to participation in the study under a protocol approved by the Columbia University Irving Medical Center Institutional Review Board. Subjects tested in person were consented in person. All systems were in compliance with the institutional information security charter. After completing consent, patients were asked to complete a brief demographic survey covering their otologic history, relevant medical history, and primary language. Subjects were instructed to complete the study using direct stream to their CI, or if this was unavailable, using external speakers in a quiet room.
Sound/audio engineering for the Columbia Speech Quality Instrument
Subjects were presented the CSQI, which consists of a series of nine audio clips previously developed and validated among normal hearing listeners (Chen et al., 2018). Each audio clip consists of a male or female speaker reading the Rainbow Passage (Fairbanks, 1960). Two audio clips contain original audio, while the remaining have been manipulated by a sound engineer using Apple Logic 9 Pro recording software (Apple Inc., Cupertino, CA) to accentuate one of the following goal qualities: bassy, cartoonish, far, garbled, mechanical, not speech, or rough. The final audio clips were as follows: original male, original female, not-speech female, bassy male, cartoonish female, far male, garbled male, mechanical female, rough male.
Following each clip, subjects rated the speech on 14 characteristics using a visual analog scale (VAS):
1. Cartoonish (10) vs. not cartoonish (0)
2. Clear (10) vs. garbled (0)
3. Like (10) vs. did not like (0)
4. Breathy (10) vs. not breathy (0)
5. Smooth (10) vs. rough (0)
6. Echo-y (10) vs. not echo-y (0)
7. Tinny (10) vs. bassy (0)
8. Soothing (10) vs. not soothing (0)
9. Natural (10) vs. unnatural (0)
10. Mechanical (10) vs. not mechanical (0)
11. Hoarse (10) vs. not hoarse (0)
12. Pleasant (10) vs. unpleasant (0)
13. Male (10) vs. female (0)
14. Sounds like speech (10) vs. does not sound like speech (0)
Technical specifications: application structure
The main web application was developed by an experienced full stack web developer (WYC Technologies, New York, NY). The program was written in the Python programming language and runs on the latest version of web application framework known as Django 1.11. Version 1.11 of Django is supported with security patches and upgrades until at least April 2020. It includes several open-source libraries as is typical in modern web development, but also as few as necessary to reduce complexity. The latest version of PostgreSQL is used for the application database. Network HTTPS requests are reverse-proxied by nginx, which is also used to terminate TLS connectivity.
The responsible web application browser frontend was written in JavaScript using the ReactJS framework, free, open-source, and maintained by Facebook, Inc. Several common packages were used from the NodeJS ecosystem to provide user interface functionality. All communication to the backend occurs through HTTPS connections at API endpoints that authenticate and authorize requests based on unique survey codes.
The server runs Debian 9 with GNU/Linux on Amazon Web Services EC2. The application and database both run on the server. A virtual firewall restricts all access aside from HTTP, HTTPS, SSH, and ICMP Ping requests. HTTP is only used to redirect to HTTPS. The proper TLS certificates have been generated with LetsEncrypt.
Statistical analysis
All statistical analysis was performed using Stata 13.0. Inter-item correlation was calculated using Spearman's rank-order correlation, with moderate correlation defined as r ≥ 0.35. Factor analysis was used to determine factor loading, and scree plot analysis was used to determine the number of factors to retain. Final factor loadings were determined by VARIMAX rotation, and items with factor loading ≥0.4 were retained. Cronbach's alpha was calculated using the built-in alpha function in Stata. Test-retest reliability was calculated via intraclass correlation using a random-effects model with a maximum likelihood estimator among participants who completed the CSQI twice within 1 week.
Results
Demographics
Thirty-six participants completed the CSQI, with a mean age of 64.2 ± 14.7 years (mean ± SD) at time of survey completion (Table 1). Participants were on average 2.98 ± 2.62 years post-CI implantation, and 66.7% of participants were female. Of the 14 participants who reported number of deaf years pre-implantation, the average number of deaf years was 24.1 ± 17.8 years. Eleven participants completed test-retest of the CSQI within 1 week.
Construct validity
Construct validity was determined by inter-item correlation (Table 2). All speech quality items except Bassy were at least moderately correlated (r ≥ 0.35) with another item in this survey. The highest correlation was found with Pleasant and Natural with a correlation coefficient of 0.81. Pleasant, Smooth, and Natural all had high correlation (r ≥ 0.7) with each other. The lowest correlation was found with Sex ID and Bassy with a correlation coefficient of 0.04.
Instrument reliability
Scree test identified two subsets of speech quality items for factor analysis (Table 3, Figure 1). Items with factor loading ≥0.4 were retained. Items that loaded onto factor 1 include clear/garbled, like/dislike, smooth/not smooth, echo-y/not echo-y, soothing/not soothing, natural/not natural, mechanical/not mechanical, pleasant/not pleasant, and speech-like/not speech-like. Items that loaded onto factor 2 include cartoonish/not cartoonish, breathy/not breathy, mechanical/not mechanical, and hoarse/not hoarse. Bassy/tinny and male/female (i.e., sex ID) did not load onto either factor, and mechanical/not mechanical loaded onto both factors.
Figure 1. Scree plot identified two subsets of speech quality items for factor analysis. Items with factor loading ≥0.4 were retained.
Internal consistency was determined by Cronbach's alpha, which is calculated as 0.93 for factor 1 and 0.69 for factor 2. Among the 11 participants who completed the CSQI twice within a period of 1 week, test-retest reliability was determined by intraclass correlation, which was calculated as 0.78 (95% conf. interval: 0.49–0.95, P < 0.001).
Discussion
The CSQI was previously validated in normal hearing participants (Chen et al., 2018); our findings suggest that this instrument is suitable for use in the CI population. This tool fulfills the critical need for a validated instrument to assess the frequently reported complaints of speech quality in cochlear implantees. The test is short, easily completed, and self-administered on a computer, making it clinically feasible and well-suited for implementation in a broader clinical setting. Moreover, it is the first validated instrument employed to examine speech quality and its pleasantness in CI users.
Our psychometric analysis with limited re-validation of the CSQI in the CI population was determined by examining validity, reliability, and factor structure in this population. Content validity was achieved during the design of the instrument as described in our previous report (Chen et al., 2018). Briefly, a focus group of otolaryngologists, audiologists, and speech pathologists identified 18 items to define speech quality. Speech stimuli were recorded by 2 male and 2 female voices, then modified by sound engineers to accentuate 10 goal qualities for a total of 44 speech clips. The speech clips were then presented to normal-hearing listeners and each speech quality item of each clip was rated on a 10-point visual analog scale. Based on these preliminary results, items and clips were pruned to a finalized set for the CSQI.
Construct validity was confirmed by inter-item correlation, which demonstrated 13/14 speech quality items had at least moderate correlation with another item. Among CI users, bassy/tinny was the only item that did not correlate with another item. In comparison, our previous study showed that all items demonstrated at least moderate correlation with another item among normal hearing individuals (Chen et al., 2018). This difference among the CI and normal hearing groups may be a result of abnormal pitch perception through CIs, different demographic distribution, or other confounding factors (Zeng et al., 2014). Alternatively, bassy/tinny may truly not be associated with any of the other speech quality items, and exists as a unique trait to be measured.
Reliability was determined by internal consistency and test-retest reliability. Based on a cutoff of factor loading ≥0.4, 12/14 speech quality items loaded onto either factor 1 or factor 2; Bassy/tinny and sex ID were the only items that did not load onto either factor, which may also be a result of altered pitch perception or changes in temporal cues and spectral cues through CIs (Fu et al., 2005; Zeng et al., 2014). Factor 1, consisting of 9/14 items, had excellent internal consistency based on Cronbach's alpha of 0.93, while factor 2, consisting of 4/14 items, had acceptable consistency based on Cronbach's alpha of 0.69. Thus, while the items within factor 1 are highly correlated, the items in factor 2 are not as closely correlated and may individually be important measures.
Although the bassy/tinny item did not demonstrate at least moderate correlation with another item or demonstrate loading on any of the two factors, many cochlear implantees anecdotally report the speech they hear as bassy or tinny. Sex ID also demonstrated near-significant loading at 0.321 for Factor 2—cochlear implantees are known to struggle with gender identification with smaller differences in mean fundamental frequency of the speaking voice (Fu et al., 2005). In addition, the mechanical/not mechanical item loaded onto both factors, indicating redundancy of the item. However, this is a common complaint by CI users, and was retained for the final set of CSQI items. Of note, results of exploratory factor analysis are solely based on data and not on any theoretical basis; thus, it is important to consider inclusion of clinically relevant characteristics such as bassy/tinny and sex ID. That said, our examination of the psychometric properties of the CSQI supports elimination of the bassy/tinny and sex ID for the cochlear implantee population. This also helps facilitate a shorter assessment with better prospects for incorporation into clinical use.
The primary limitations to this study include the sample size, variability in the demographics of our participant population, and inability to control for listening environments (i.e., in a standardized audiology suite or soundproof). Due to the nature of the CI user population available for participation, the average age and sex distribution are skewed toward older and more female participants than the population used to validate the CSQI in normal hearing individuals. The heterogeneity of CI usage (i.e., total time spent using CI) and years of deafness prior to implantation were also not accounted for during validation of the CSQI. For example, at the time of taking the CSQI, 16.7% of participants had their CI(s) for <1 year, 52.8% for 1–3 years, and 30.6% for >3 years (Table 1). This did not account for frequency of usage of the CI; indeed, duration of daily processor use is significantly correlated with speech recognition abilities in adult cochlear implantees (Holder et al., 2020). Thus, compared to novice CI users, experienced users may be able to more easily identify the speech characteristics presented in the CSQI. Similarly, adults with prelingual deafness are known to demonstrate poorer speech outcomes and pre/post-CI improvement compared to those with postlingual deafness (Boisvert et al., 2020). In our study population, several participants were noted to had deafness since an early age (e.g., ~3–4 years of age). Our study had limited participant data regarding the etiology and status of the contralateral ear, given many were recruited online. Participants were also tested in a mix of conditions (direct stream and external speakers) based on convenience and technology limitations of participants who were doing the study at home. In instances where speakers were used, the contralateral ear was not plugged to isolate the non-CI ear. As such, there are also insufficient data to address unilateral, bilateral CI, or bimodal strategies, which are the focus of ongoing studies. These differences in demographics may affect the interpretation of the speech quality items, and may contribute to the observed differences in inter-item correlation and internal consistency. Nonetheless, our study was still able to demonstrate excellent construct validity (13/14 items were at least moderately correlated with each other) and reliability (12/14 speech items loaded on either factor 1 or 2) in CI users. Finally, cochlear implantees may experience improvement in speech quality over time as patients acclimate to their device and undergo central cortical adaption, similar to the way they experience improvement in speech perception. As such, this assessment should be employed throughout the rehabilitation process.
The novel use of an online survey method provides many advantages including accessibility and allowing users to listen in their normal hearing environment, but also introduces variability in audio device quality and ambient noise levels among participants. Although this heterogeneity of listening environments may have contributed to observed differences in speech quality, the CSQI still demonstrated construct validity and reliability of the CSQI despite the variability in these demographic factors. The online nature of the instrument also provides the advantages of reaching a larger user base in their natural listening environment thus increasing the clinical utility of CSQI. Having participants take the CSQI on a computer with speakers in a quiet room demonstrates more ecological validity (i.e., more similar to a real-life environment) and clinical feasibility than having them visit their audiologist and take the test in a sound booth. This is particularly important in the setting of the current COVID-19 environment, where it is necessary to minimize risk of exposure. As such, a study that future participants complete the CSQI at home is a practical solution.
The CSQI adds to the current options of validated tools available for improving the experience of CI users by developing a shared vocabulary to define attributes of speech, providing a library of standardized speech clips with accentuated speech characteristics, and establishing a standardized method of measuring speech quality. It is critical to ensure that vocabulary used by normal hearing individuals and cochlear implantees is consistent, as it allows providers and CI users to communicate effectively about the CI listening experience. With the CSQI, specific terms can be linked to specific qualities of speech across both normal hearing and CI participants. Similarly, the 9 speech clips within the CSQI can serve as universal standards for the speech characteristic each clip is engineered to portray, and can be used in future studies or tools.
Future efforts will be directed at using participant-reported scores per speech clip to generate an overall score to represent how pleasant and/or accurate speech quality sounds to the participant. Compiling metrics for overall performance of speech quality production will allow for numerous applications of the CSQI in research and clinical use. As a research tool, the CSQI can be used to compare new developments in CI technology, to quantifiably demonstrate if newer speech processing strategies, electrodes, devices, or other advancements can improve speech quality heard through CIs. In addition to speech recognition, improvements in speech quality as measured by the CSQI can become standard outcomes for measuring success of cochlear implantation.
We also envision the CSQI becoming implemented as a diagnostic tool in the clinic for assessing the effects of changes in CI hardware or software on perceived speech quality. Once included into the standard battery of tests that CI recipients undergo during each check-up visit, the CSQI can be trended over time to monitor the progress of either the CI user's acclimatization to the device or the modifications to speech processor settings or hardware. For example, Figure 2 demonstrates that there is improved speech quality as measured by the SCQI over time in our group of CI users. Providers can use the CSQI during in-person or virtual telemedicine visits to tailor CI recipients' program settings to maximize enjoyment of listening to speech.
Figure 2. Scatterplot illustrating Columbia Speech Quality Index (CSQI) scores of all the study participants for a sample stimulus (original female) plotted as a function of time since cochlear implantation. Each circle is used to denote a unique study/participant. There is a trend of increasing CSQI scores with increased time since implantation, as participants get acclimated to their implant.
Conclusion
The Columbia Speech Quality Instrument (CSQI)—a concise and portable computerized test previously validated in normal hearing users—has strong psychometric properties in the CI population. Our findings suggest tinny/bass and male/female characteristics should be removed prior to implementation of the CSQI in the CI population. This instrument may be utilized in cochlear implantees so quantitative measurements of speech quality can be used to track changes across various electrodes, devices, and speech processing strategies to optimize listener enjoyment. The online format of the CSQI allows it to be widely distributed and accessible to a larger, more diverse user base. Future studies can examine modifiable aspects of speech to enhance CI speech enjoyment and explore differences between CI and normal hearing speech quality perception.
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Ethics statement
The studies involving humans were approved by Columbia University Irving Medical Center Institutional Review Board. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.
Author contributions
AL: Conceptualization, Methodology, Supervision, Writing – review & editing, Writing – original draft. MC: Data curation, Investigation, Formal analysis, Project administration, Validation, Visualization, Writing – review & editing, Writing – original draft. TH: Data curation, Investigation, Project administration, Writing – review & editing, Writing – original draft. AC: Formal analysis, Visualization, Validation, Writing – review & editing, Writing – original draft. LT: Data curation, Investigation, Project administration, Writing – review & editing, Writing – original draft. SC: Writing – review & editing, Writing – original draft. MS: Validation, Writing – review & editing, Writing – original draft. DM: Conceptualization, Investigation, Writing – review & editing, Writing – original draft. IC: Conceptualization, Investigation, Supervision, Writing – review & editing, Writing – original draft.
Funding
The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.
Acknowledgments
The authors would like to acknowledge the contributions of WYC Technologies and Peter Karl Studios in the engineering of the audio used in the CSQI and its online platform.
Conflict of interest
AL: Haystack Medical (Founder and Equity Owner).
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fauot.2024.1362810/full#supplementary-material
Supplementary Data 1. CI speech perception scores.
Supplementary Data 2. Columbia SQI raw data.
References
Amann, E., and Anderson, I. (2014). Development and validation of a questionnaire for hearing implant users to self-assess their auditory abilities in everyday communication situations: the Hearing Implant Sound Quality Index (HISQUI19). Acta Otolaryngol. 134, 915–923. doi: 10.3109/00016489.2014.909604
Boisvert, I., Reis, M., Au, A., Cowan, R., and Dowell, R. C. (2020). Cochlear implantation outcomes in adults: a scoping review. PLoS ONE. 15:e0232421. doi: 10.1371/journal.pone.0232421
Chen, S. Y., Griffin, B. M., Mancuso, D., Shiau, S., DiMattia, M., Cellum, I., et al. (2018). The development and validation of the speech quality instrument. Laryngoscope 128, 1622–1627. doi: 10.1002/lary.27041
Fu, Q. J., Chinchilla, S., Nogaki, G., and Galvin, J. J. (2005). Voice gender identification by cochlear implant users: the role of spectral and temporal resolution. J Acoust Soc Am. 118, 1711–1718. doi: 10.1121/1.1985024
Gatehouse, S., and Noble, I. (2004). The speech, spatial and qualities of hearing scale (SSQ). Int. J. Audiol. 43, 85–99. doi: 10.1080/14992020400050014
Holder, J. T., Dwyer, N. C., and Gifford, R. H. (2020). Duration of processor use per day is significantly correlated with speech recognition abilities in adults with cochlear implants. Otol. Neurotol. 41, e227–e231. doi: 10.1097/MAO.0000000000002477
Luo, X., Fu, Q. J., and Galvin, J. J. (2007). Cochlear implants special issue article: vocal emotion recognition by normal-hearing listeners and cochlear implant users. Trends Amplif. 11, 301–315. doi: 10.1177/1084713807305301
Vongphoe, M., and Zeng, F. G. (2005). Speaker recognition with temporal cues in acoustic and electric hearing. J. Acoust. Soc. Am. 118, 1055–1061. doi: 10.1121/1.1944507
Zeng, F. G., Tang, Q., and Lu, T. (2014). Abnormal pitch perception produced by cochlear implant stimulation. PLoS ONE 9:e88662. doi: 10.1371/journal.pone.0088662
Keywords: cochlear implantation, cochlear implant, hearing loss, speech recognition, word recognition, speech perception, speech quality, validated instrument
Citation: Lalwani AK, Chun MB, Hwa TP, Chern A, Tian L, Chen SY, Stewart MG, Mancuso D and Cellum IP (2024) Examining the psychometric properties of the Columbia Speech Quality Instrument in cochlear implant users. Front. Audiol. Otol. 2:1362810. doi: 10.3389/fauot.2024.1362810
Received: 29 December 2023; Accepted: 29 April 2024;
Published: 12 June 2024.
Edited by:
Dayse Tavora-Vieira, Fiona Stanley Hospital, AustraliaReviewed by:
Anna Rita Fetoni, Universita' degli Studi di Napoli Federico II, ItalyAndre Wedekind, University of Western Australia, Australia
Copyright © 2024 Lalwani, Chun, Hwa, Chern, Tian, Chen, Stewart, Mancuso and Cellum. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Anil K. Lalwani, YWtsMjE0NCYjeDAwMDQwO2N1bWMuY29sdW1iaWEuZWR1