A New Berlin Questionnaire Simplified by Machine Learning Techniques in a Population of Italian Healthcare Workers to Highlight the Suspicion of Obstructive Sleep Apnea

De Nunzio, Giorgio; Conte, Luana; Lupo, Roberto; Vitale, Elsa; Calabrò, Antonino; Ercolani, Maurizio; Carvello, Maicol; Arigliani, Michele; Toraldo, Domenico Maurizio; De Benedetto, Luigi

doi:10.3389/fmed.2022.866822

ORIGINAL RESEARCH article

Front. Med., 25 May 2022

Sec. Pulmonary Medicine

Volume 9 - 2022 | https://doi.org/10.3389/fmed.2022.866822

This article is part of the Research TopicObstructive Sleep Apnea Syndrome (OSAS). What's New?View all 12 articles

A New Berlin Questionnaire Simplified by Machine Learning Techniques in a Population of Italian Healthcare Workers to Highlight the Suspicion of Obstructive Sleep Apnea

Giorgio De Nunzio^1,2^*^†

Luana Conte^1,2^†

Roberto Lupo³

Elsa Vitale⁴

Antonino Calabrò⁵

Maurizio Ercolani⁶

Maicol Carvello⁷

Michele Arigliani⁸

Domenico Maurizio Toraldo⁹

Luigi De Benedetto¹⁰

¹Laboratory of Biomedical Physics and Environment, Department of Mathematics and Physics “E. De Giorgi”, University of Salento, Lecce, Italy
²Laboratory of Interdisciplinary Research Applied to Medicine, University of Salento, Local Health Authority, Lecce, Italy
³“San Giuseppe da Copertino” Hospital, Local Health Authority, Lecce, Italy
⁴Department of Mental Health, Local Health Authority, Bari, Italy
⁵“Nuovo Ospedale degli Infermi” Hospital, Local Health Authority, Biella, Italy
⁶Local Health Authority Marche Area Vasta 2 Health Department, Ancona, Italy
⁷Brisighella Community Hospital, Local Health Authority, Romagna, Italy
⁸Ear, Nose, and Throat Unit, “Vito Fazzi” Hospital, Local Health Authority, Lecce, Italy
⁹Cardio-Respiratory Unit Care, Department of Rehabilitation, “Vito Fazzi” Hospital, Local Health Authority, Lecce, Italy
¹⁰Integrated Therapies in Otolaryngology, Campus Bio-Medico University, Rome, Italy

Obstructive sleep apnea (OSA) syndrome is a condition characterized by the presence of repeated complete or partial collapse of the upper airways during sleep associated with episodes of intermittent hypoxia, leading to fragmentation of sleep, sympathetic nervous system activation, and oxidative stress. To date, one of the major aims of research is to find out a simplified non-invasive screening system for this still underdiagnosed disease. The Berlin questionnaire (BQ) is the most widely used questionnaire for OSA and is a beneficial screening tool devised to select subjects with a high likelihood of having OSA. We administered the original ten-question Berlin questionnaire, enriched with a set of questions purposely prepared by our team and completing the socio-demographic, clinical, and anamnestic picture, to a sample of Italian professional nurses in order to investigate the possible impact of OSA disease on healthcare systems. According to the Berlin questionnaire, respondents were categorized as high-risk and low-risk of having OSA. For both risk groups, baseline characteristics, work information, clinical factors, and symptoms were assessed. Anthropometric data, work information, health status, and symptoms were significantly different between OSA high-risk and low-risk groups. Through supervised feature selection and Machine Learning, we also reduced the original BQ to a very limited set of items which seem capable of reproducing the outcome of the full BQ: this reduced group of questions may be useful to determine the risk of sleep apnea in screening cases where questionnaire compilation time must be kept as short as possible.

Introduction

Obstructive Sleep Apnea (OSA) is a syndrome characterized by partial or complete obstruction of the upper airways during sleep. This phenomenon, in turn, causes numerous and repetitive arousal from sleep to restore airways, leading to disrupted sleep, daytime hypersomnolence, and sympathetic activation. The obstruction of the airways may also lead to blood oxygen desaturation (1) during sleep, and cardiovascular lesions (2). OSA is associated with numerous conditions including stroke, hypertension and death (3, 4). These comorbidities are particularly evident in obese patients, and varying in severity according to gender and age.

The prevalence of OSA is highly different in the general population, ranging from 9 to 38%, with older age, male gender, and obesity as known risk factors (1, 5, 6). In advanced age groups, prevalence can even increase to 84% (1).

According to a worldwide epidemiological prevalence study (5) there are an estimated 936 million OSAS patients aged 30–69 years with mild-moderate OSA and 425 million patients aged 30–69 years with severe OSA who need Continuous Positive Airway Pressure (CPAP) treatment. In Italy, one study estimated the prevalence of moderate-to-severe OSA in 27% of the general population, with an overall prevalence of mild and moderate-to-severe OSA of more than 24 million people in the ages 15–74 years (54% adult population), while from a practical perspective, Italian NHS physicians diagnosed only 460,000 moderate-to-severe patients (4% of estimated prevalence) and 230,000 patients were treated (2% of estimated prevalence), highlighting a substantial gap between diagnosis and treatment. Considering that each patient is diagnosed many years after the onset of the disease, the direct and indirect healthcare costs determine a significant burden for the National Health System (NHS), which affects every single citizen. Prevention and early diagnosis are the only ways to achieve cost containment and improved quality of life.

Although studies have considerably increased in recent years, to date OSA is still a highly underdiagnosed disease. The gold standard for OSA diagnosis is nocturnal polysomnography (PSG) in the sleep laboratory. However, since this is not well workable for large numbers of patients, the Home Sleep Test (HST) is also an accepted validated ambulatory diagnostic method. Among non-invasive screening tools for OSA diagnosis in the general population, the Berlin questionnaire (BQ) (7) is the most widely used to define patients at risk for OSA. It was employed for the first time in the US: it contains ten questions related to risk factors and symptoms of OSA with the purpose of selecting high-risk OSA patients that may undergo polysomnography and increase the number of diagnosed patients.

The main purpose of this study was to find possible risk factors that are best correlated with being at high risk for OSA—according to the BQ—in professional nurses in order to investigate the possible impact of OSA on healthcare systems by considering one of the most important categories in health and assistance fields. We also assessed the capabilities of a reduced BQ of predicting a high-risk OSA group according to the result of the standard BQ. For this purpose, we used techniques related to supervised feature selection and Machine Learning.

Methods

Design

From May 2020 to September 2021 a cross sectional, multicenter study was conducted among professional nurses. Four hundred and five Italian subjects agreed to participate in the study. No eligible criteria were applied to the volunteers. The survey was conducted by means of an anonymous electronic questionnaire distributed on a voluntary basis. All subjects were asked to answer the BQ (7) and an additional set of 38 questions including items about baseline socio-demographic characteristics, work information, clinical status, and symptoms category. In particular, socio-demographic characteristics included gender, age, BMI, smoking, and neck circumference. For work information, we intended years of work experience, working hours, work shift, work shift regularity. For health status, we assessed the presence of arrhythmias, sleep disturbances, hypo/hyperthyroidism, anxiety, hypertension, transient ischemic attack or stroke, diabetes mellitus, chronic obstructive pulmonary disease (COPD), asthma, anxiety, depression, frequent confusion or agitation, craniofacial morphological alterations, alcohol and drug abuse. Symptoms category included difficulty staying awake during an activity, difficulty concentrating, difficulty in expressing oneself, use of stimulants, interference with work, interference with social relationships, slow reactions and difficulty keeping attention up, difficulty in paying attention to several tasks at once, striving not to make mistakes, and need to doze off.

The Berlin Questionnaire

The BQ (7) is the most widely used non-invasive screening tool for OSA diagnosis devised to identify subjects with a high likelihood of having OSA based on the frequency, loudness, disturbance and breathing interruptions of nocturnal snoring, on daytime sleepiness, and on the presence of high blood pressure/obesity. The BQ consists of three categories of questions related to the risk of having sleep apneas. Patients can be classified into high-risk or low-risk based on their responses to the individual items and their overall scores in the symptom categories. Category 1 contains five items and incorporates questions about snoring; Category 2 contains three items investigating daytime somnolence; Category 3 contains one item assessing hypertension and information about the Body Mass Index (BMI). Scores from the first two categories were positive if the responses indicated frequent symptoms, such as more than 3–4 times per week, whereas the score from the third category was positive if there was a history of hypertension or a BMI > 30 Kg/m² (7). The overall score was determined from the response to the three categories. Patients were scored as being at OSA high-risk when they had a positive score on two or more categories, else they were considered as being at low-risk (7).

Statistical Analysis

The answers of all respondents to the BQ were analyzed using descriptive statistics. To identify items associated with being at high-risk of OSA, baseline characteristics, working information, health status, and symptoms category were separately studied in the two OSA risk groups. Continuous variables were summarized by mean and standard deviation (SD) and categorical variables by frequencies and percentages. Kruskal Wallis test and Mann-Whitney U-test were used for assessing difference between high vs. low risk of having OSA. Contingency tables were also analyzed, and chi-square and Fisher's exact tests were carried out to ascertain the presence of relations between the two OSA risk groups. A p-value <0.05 was considered statistically significant. BQ scoring and statistical analyses were conducted for all qualitative and quantitative variables using Matlab software.

Predictive Value

Calculating group statistics is important to establish the statistical relevance of variables in a diagnostic problem so that risk factors or relationships with comorbidities can be assessed. Nonetheless, it is well known (8–10) that relevance is not a synonym for discriminant power, the latter being most useful in classification and prediction: significant variables in a statistical model do not guarantee prediction performance, and non-significant attributes might reveal predictive. For this reason, we decided to also study both Berlin and our questionnaires from the point of view of their prediction capabilities, by techniques related to supervised feature selection and Machine Learning.

It must be noted that prediction in this case is not related to actual OSA diagnosis, because the only data on which we worked is the response to the questionnaires: therefore, the target variable was simply the high risk of being affected by OSA according to the result of the BQ. As the latter is not a perfect test and can give FP and FN (11, 12), our conclusions are valid within the same limits.

XGBoost (13) in python was chosen as the classifier model. A relevant reason was that the responses to the questionnaires unfortunately had a certain number of missing answers and out-of-the-box XGBoost deals quite satisfactorily with missing data thanks to the algorithm called “sparsity-aware split finding”: therefore no explicit imputation mechanism (14) had to be implemented. Moreover, XGBoost is fast and reliable, as also witnessed by frequent wins on Kaggle competitions with this classifier¹.

After converting the ordered response scales to numeric, the following analysis were performed. First, the Fisher score (15) was calculated on each variable. This index measures the ratio between the inter-class distance and the total intra-class variance, $F = {({\bar{x}}_{1} - {\bar{x}}_{2})}^{2} / (σ_{1}^{2} {+ σ}_{2}^{2})$ where ${\bar{x}}_{j}$ and $σ_{j}^{2}$ are the mean and the variance of a variable for class j. F is a parameter clearly related to the discrimination power of each attribute. Similarly, the area under the ROC (Receiver Operating Characteristics) curve (AUROC) for each variable was computed, directly measuring its predictive power. The Fisher score and the AUCROC have similar meaning but they are independent, so they complement each other. However, though these two figures of merit are important because they assess the discriminant power of each feature individually, nonetheless they only partially characterize the dataset, as they neglect the combination of features, which means evaluating two or more features together: it often happens that the scores for single features is low but their combination is strongly discriminant, so some mechanism of feature group scoring assessment is necessary. For this purpose, we employed the backward Sequential Feature Selector (bSSF) from scikit-learn², with XGBoost as the scorer, to build a plot of AUROC vs. the cardinality of the optimal subset of features, from which we could infer interesting conclusions on the prediction power of feature combinations. We finally performed some ad-hoc calculations on particular subsets of features, which we considered interesting.

The feature selection procedure based on bSSF was built as follows. We started from the whole dataset of feature vectors containing n attributes. The dataset was randomly split into two parts, one for feature selection (P1) and the other for quality assessment (P2) of each subset of selected features. Proportions between selection and quality assessment datasets were arbitrarily set to 70 and 30% of the whole dataset, respectively.

At the m-th step (m going from 0 to n – 2), feature selection by bSSF, from n – m to n – m – 1 features, was applied on the P1 dataset, followed by prediction quality measurement on the selected features. Therefore, each iteration took as its input the dataset containing the “best” features, as selected by the preceding iteration. At each iteration (with fixed m), instead of performing feature selection just once, we preferred to study the robustness of the selected subset of features, by applying bSSF a given number of times (typically 100), each time recording which feature was considered as the least important (downvoted). As the P1 vectors were shuffled before bSSF application, we had a certain variability on the selected features and, at the end of this internal loop, we removed the feature that had been downvoted more often.

At this point, with a robust subset of features, we calculated the AUROC (arbitrarily with 50 iterations) on the quality assessment dataset P2 and assigned the average AUROC (with an uncertainty calculated as the standard deviation) to the feature set.

The loop on m then continued, until there was just one feature in the dataset.

The results of this process were:

• A graph showing AUROC as a function of the number of selected features.

• A list of features, ordered by importance (considering that the least predictive variables, in a multivariate framework, were discarded first).

The whole procedure was repeated many times, each time modifying the initial split between P1 and P2, so that the influence of random splitting might be judged.

Ethical Considerations

The ethical aspects of the study were set out in the questionnaire presentation, which was designed in accordance with the principles of the Italian data protection authority (DPA). It was emphasized that participation was voluntary and that the participant could refuse participation in the protocol whenever he or she wished. Those who were interested in participating were given an informed consent form, which recalled the voluntary nature of participation, as well as the confidentiality and anonymous nature of the information.

Results

Sample Demographics

Out of 405 people to whom the BQ was administered, the response rate was 95% (n = 387). Women were 292 (75% of respondents) and 184 (47.5%) were over 40 years old. The median BMI was 25.4 Kg/m² (range 18–46 Kg/m²).

Berlin Questionnaire Score and Metrics

The BQ was evaluated for all respondents and data were collected (Table 1). According to the questionnaire, the subjects were stratified into low vs. high OSA risk groups by means of a score calculation. Among all subjects, 76 (20%) were categorized as high likelihood of having OSA. Table 2 shows the BQ answer counts subdivided between low and high Berlin score subjects.

TABLE 1

Table 1. The Berlin questionnaire evaluated for all respondents.

TABLE 2

Table 2. Berlin questionnaire items between low and high Berlin score (low vs. high OSA risk groups).

Respondents were also asked if they had already been diagnosed for OSA through a gold standard test (e.g., polysomnography). Among the subjects identified as high-risk, 24% (n = 18, 5% of the complete sample) had already been diagnosed with OSA whereas 76% (n = 58, 15% of the sample) had not undergone any diagnostic test. Among the subjects categorized as low-risk for OSA, 1% (n = 2), had received a diagnosis of OSA (false negatives) whereas 99% had not been tested.

As reported in the literature (16), the dominant symptom of OSA is snoring with a prevalence of 75–90%. Accordingly, in our sample the high-risk OSA group had a significantly larger proportion of respondents reporting frequent snoring (95%) compared to the low-risk group (21%). Nocturnal snoring also increased in frequency and loudness in high-risk OSA cases compared with low-risk, and this difference was statistically significant (p < 0.001 for both). Specifically, 28% of the high-risk group report snoring very loudly compared with 3% of the low-risk group. The percentage of those who snore every night also increases from 10 to 63% in the high-risk group.

Nocturnal symptoms may also include apnea and dyspnea generally observed by bed partners and this was confirmed by the bothersome snoring percentage that passed from 22% in the low-risk to 80% in the high-risk group. These differences were statistically significant (p < 0.001).

The high-risk group also reported more breathing interruptions than the low-risk subjects (p < 0.001).

Fatigue, somnolence at awakening and during daytime are also symptoms significantly present in the high-risk group compared to the low risk group (p = 0.0018 and 0.0029, respectively). The percentage of those who reported falling asleep while driving a vehicle was also higher in the high-risk group (24%) than for the low-risk subjects (9%), with a statistically significant difference (p < 0.001).

This significance is also present in the frequency of episodes (p < 0.001).

High blood pressure was also reported in half of the high risk subjects (51%) compared with 5% of the low risk ones, and this difference was statistically significant.

Socio-demographic characteristics, work information, clinical factors, and symptoms category were compared between the two OSA risk groups. The results are summarized in Table 3.

TABLE 3

Table 3. Baseline characteristics of nurses between low and high Berlin scores (low vs. high OSA risk groups).

Predictive Value of the Berlin Questionnaire Variables

Fisher Indices and AUROC for Single Variables

The ten variables from the BQ plus BMI were considered. The most discriminant variables were the four related to snoring (B1 to B4 in Table 1) with B1 being the most important in absolute (AUROC = 0.88, F = 1.9) and snoring loudness B2 being the least predictive. As to the two variables with relatively objective measurement, i.e., having high blood pressure, B10, and the body mass index (computed from the subject physical data), the former had high predictivity (AUROC = 0.74, F = 0.80) while the latter showed lower discriminant power (AUROC = 0.64, F = 0.03). This result was quite surprising if compared with the one reported in (18) where BMI is found to be quite a strong predictor.

Sequential Feature Selection

The typical relationship between the number of features and AUROC we obtained by the bSSF procedure is shown in Figure 1. Repeating the run with different random splits of P1 vs. P2 partitioning did not appreciably change the result, with AUROC for sets ≥ 3 features always attaining values near 1. Reaching so high AUROC with the full set of variables, of course, has no particular meaning because the target variable (high risk of OSA) is obtained from the BQ variables (the answers to the questions), so there exists a well-established a priori relationship between the variables and the target, which the classifier finds. On the other hand, what is surprising is the fact that a subset of three variables is capable of predictive power comparable to the whole questionnaire.

FIGURE 1

Figure 1. Functional dependence of AUROC on the cardinality of the set of selected features.

The subset of three variables was reasonably robust and did not depend too much on the particular dataset split; after about 60 runs, the subset was found to contain the variables computed from B10 (selected at every run), B1 (present in 73% of the “best” feature subsets), B6 (presence in 38%), B7 (37%), B3 (25%), B4 (2%). We remark that hypertension B10 is always among the most useful features [which was already known from the single-variable calculations; this result confirms what was found in (18)]. Considering now the remaining five features, three concern snoring (B1, the most voted after B10; then B3 and B4) while two concern feeling tired in daytime, either at wake-up or along the day (B6 and B7), with similar presence in the subsets. By calculating the (normalized) co-occurrence matrix of these five variables in the “best” feature subsets:

C O M = \begin{matrix} B 1 \\ B 6 \\ B 7 \\ B 3 \\ B 4 \end{matrix} (\begin{matrix} 0.183 & 0.100 & 0.083 & 0 & 0 \\ 0.100 & 0.158 & 0 & 0.054 & 0.004 \\ 0.083 & 0 & 0.092 & 0.008 & 0 \\ 0 & 0.054 & 0.008 & 0.062 & 0 \\ 0 & 0.004 & 0 & 0 & 0.004 \end{matrix})

It is evident that B1 is always accompanied by B6 or B7, so two natural choices for three-variate subsets of features could be {B1, B6, B10}, immediately followed by {B1, B7, B10}. After selecting these “best” sets of variables, good practice would require verifying the conclusions on an independent test dataset. Being this impossible at this time for lack of data, their predictive values was recalculated on the whole available dataset, with 5-fold cross validation. AUROC were 0.98 for both.

Predictive Value for the Proprietary Questionnaire Variables

The proprietary questionnaire was also examined from the Machine Learning point of view, with a similar approach but very different results. The target variable was, as in the preceding analysis, the BQ output in terms of high vs. low risk of OSA. Global AUROC was not too high, with values about 0.80, which witnesses the relationship between the questions and the pathology, but also the scarce usefulness of the proprietary questionnaire in a ML context, at least with the data we possess. No variable derived from the questionnaire items revealed to be strikingly discriminant per se. Moreover, the partially stochastic nature of the feature selection process (due to the different random choices of the selection and quality assessment sets, respectively, P1 and P2), leaded to quite different AUROC vs. number of features functional dependences at each run (in which AUROC slowly decreased from 80 to 60% with the progressive depletion of the feature set).

Discussion

Of 387 screened patients who completed the BQ, about 20% (n = 76) fell within the high-risk group. Socio-demographic characteristics, work information, clinical factors, and symptoms category were compared between the two groups and are reported in Table 2.

Socio-Demographic Baseline Characteristics

Age is a well-established risk factor for OSA (19, 20). The increase in the prevalence of OSA with age could be explained in part by the increase in comorbidities, menopause, hypertension, BMI, but also by the decrease of tongue and palate muscle functions and activities that occurs in older adults (21, 22). Regarding the age of the sample, in the high-risk group 67% (n = 51) was ≥41 years old compared to 41% (n = 133) in the low-risk group. We have to consider that our cohort is predominantly composed of young subjects, more than half being <40 years old and only <2% of subjects being more than 60 years old. In our cohort, age was also found to be a risk factor significantly associated with high risk of OSA (p < 0.0001).

With respect to gender, epidemiological studies reported a prevalence ranging from 13 to 31% in men and 4 to 21% in women (17, 23–27). It is difficult to confirm this prevalence in our analysis, considering that our sample is predominantly female (76%). Despite this, we found a statistically significant difference between low-risk and high-risk groups with respect to gender (p = 0.0011). In particular, the percentage of men increases from 21% at low-risk to 39% at high-risk. In contrast, the percentage of women at low-risk is 79% and decreases in high-risk subjects (61%).

Obesity is the most severe known risk factor for OSA. Generally, almost 60% of patients with OSA are obese (28). The risk of OSA increases progressively with BMI and also with neck circumferences (29). In our analysis, the mean of BMI was significantly higher in the high-risk group than in the low-risk group (p < 0.001). Regarding neck circumferences, half of subjects did not know their neck circumferences. However, neck circumferences were higher than the chosen cut-off in the high-risk group (14%) compared to the low-risk group (6%).

No association was found with smoking and OSA in our sample and this reflects what is found in the literature (30). However, inhalation of cigarette smoke increases oxidative stress and systemic inflammation, which are typically present in OSA (30). Thus, the concomitant presence of OSA in smoker could worsen disease progression.

Work Information

Regarding work information, only the number of years of work experience seems to be associated with a high risk of OSA. However, rather than being a risk factor per se, this variable could be significant just because it is correlated with increasing age, an important risk factor previously discussed. Distribution of working time (full time/part time), work shift (day shift only or 24 h shift) and work shift regularity (yes/no) were not found to be associated with a high risk of OSA. Interestingly, professional categories and instruction level appear to be determinants between the two groups (0.039 and 0.049, respectively).

Health Status

Among all the clinical factors investigated, only the presence of craniofacial morphological alterations was not found to be a risk factor associated with an elevated risk of OSA, contrary to what reported in the literature (31). However, we must consider that only 8 subjects declared to have these alterations, which makes the sample less significant. Sleep disorders, instead, were obviously statistically significant between the two groups (p < 0.001), demonstrating the reliability of the sample.

Hypertension was already known to be associated with OSA (32, 33). Normally, 50% of hypertensive patients have OSA and this percentage rises to 85% in patients with hypertension who have at least another OSA symptom (34, 35). Subjects with OSA have an 1.8-times increased risk of resistant hypertension compared to non-OSA individuals (36). Our sample confirmed these data since 51% of high-risk persons were hypertensive compared with 5% found in low-risk subjects.

Arrhythmias and transient ischemic attack or stroke were found to be associated to high OSA risk score (p < 0.001 and p = 0.0013, respectively). This is in line with the literature, which attests that prevalence of OSA is estimated to be between two and three times higher in patients with cardiovascular diseases (37).

The percentage of OSA patients who suffer from type 2 diabetes was about 30% (n = 118). The link between diabetes and OSA seems bidirectional but has not been fully evaluated yet. In our cohort, 14% of the high-risk group shows presence of diabetes mellitus, compared to 2% of patients found in the low-risk group. This is statistically significant and the association between diabetes mellitus and being at high-risk is also significant (p < 0.001).

OSA and asthma are closely related. Numerous studies have consistently reported higher OSA burden among subjects with asthma (38, 39) and in relation to asthma severity (38, 40). In our sample, the percentage of individuals with asthma in the low-risk group was 6% rising to 24% in high-risk group. Asthma was also found to be a strong risk factor for OSA (p = 0.0018).

Chronic obstructive pulmonary disease (COPD) is also highly associated with OSA. COPD is one of the most prevalent respiratory diseases worldwide. There exists what is called COPD-OSA overlap syndrome that represents a distinct clinical diagnosis, where clinical outcomes are even worse than in each disease alone (41). Based on this evidence, we found a significant difference between the low and high-risk groups (p < 0.001).

Recent systematic reviews and meta-analyses reported that OSA is linked to depression (42) and anxiety (43). Other longitudinal studies suggested that patients with OSA are about twice as likely to be depressed than those without OSA (44, 45). In our sample, the rate of depression increased from 10% in the low-risk group to 21% in the high-risk OSA group, while the rate of anxiety increased from 33 to 54%. We also found a strong correlation between being at high-risk of OSA and having both depression and anxiety (p = 0.0014 and p = 0.0082, respectively).

Frequent confusion and agitation resulted also to be an important risk factor (p = 0.0022) in our cohort. In particular, 11% of the high-risk subjects show presence of confusion and agitation, compared to 3% of those found in the low-risk group. This phenomenon could be related to anxious behavior, but several efforts should be done for understanding this association.

Excessive alcohol consumption and drug abuse were also assessed between low vs. high score. Results from the literature revealed that alcohol consumption is associated with 25% increased risk of OSA (46). To the best of our knowledge, no data was shown for drug abuse. We found that 7% of the high-risk group declared alcohol and drug abuse, compared to 1% of patients found in the low-risk group. Alcohol and drug abuse were also found to be two independent risk factors for the high-risk group (p = 0.0022 and p = 0.0061, respectively).

Symptoms Category

Daytime OSA symptoms consist of unexplained fatigue and excessive sleepiness. Patients also report repetitive problems with concentration and memory as well as depressive symptoms (47) and impairment of cognitive functions (48). Moreover, a study of men and women aged 60 years and older showed memory impairment related to OSA and hypertension (49). All of these evidences are in line with our findings: difficulty staying awake during an activity, difficulty concentrating, difficulty in expressing oneself, use of stimulants, interference with work, interference with social relationships, slow reactions and difficulty keeping attention up, difficulty in paying attention to several tasks at once, striving not to make mistakes, and need to doze off, are all significantly strong risk factors related to high-risk of having OSA. These symptoms fully describe the OSA patient during his/her daily activity, including working and social activities.

Predictive Value of Questionnaire Items

As concerns the predictive value of the variables acquired by the BQ, our conclusion was that a reduced set of questions, i.e., a reduced set of selected features, composed only of Table 4, is sufficient to obtain an output close to that of the BQ, by using a trained XGBoost classifier.

TABLE 4

Table 4. Simplified Berlin questionnaire.

This reduced questionnaire shows some similarity with the one proposed in Arunsurat et al. (18) with the important difference that (as already remarked) BMI is not preserved in the reduced set. The discrepancy might partly come from the different group considered, i.e., the high percentage of young and prevalently female respondents in our sample compared to the all-male healthcare workers investigated in Arunsurat et al. (18).

From the Results section, it is also evident that the proprietary questionnaire is interesting from the point of view of risk factor assessment, but the ML approach gave no hint on the possibility of replacing/integrating the original Berlin test with (parts of) it. In order to clarify this possibility, a dataset with ground truth coming from PSG or HST is needed along with the questionnaire itself.

Limits

The results of our study must be considered taking into account some limitations that concern the sample size, the lack of the actual disease diagnosis for most subjects, the absence of disease follow-up and long-term effect investigation for the subjects who declared to suffer from OSA and, finally, the possible reluctance of the respondents to faithfully declare their health status since they are professional nurses. Moreover, our survey group does not fully represent the general population, because of the high percentage of young and prevalently female respondents. Finally, we are also aware that the study might give different conclusions in different ethnic groups, depending on language, habits, lifestyles or physical conformation.

Conclusions

In conclusion, there are numerous risk factors associated with a high-risk of having OSA in a population of nurses. Given the high percentage of people who are still underdiagnosed for OSA and the lack of knowledge about this disease, our study contributes to highlight an alarming result that may be just the tip of the iceberg. This study could be helpful to expand awareness about it, especially among professional nurses, who are one of the most important categories in health and our care. It could also allow more professionals to investigate suspected patients who could undergo overnight polysomnography, as well as to explore possible alternative screening tests and cures for the treatment of this still too hidden disease.

Further efforts should be done to increase the number of diagnoses but also, more importantly, to refer these subjects for screening. On this regard, our simplified test might also allow a better administration of the questionnaire facilitating the orientation of the subject at risk toward the diagnostic pathway. We plan indeed a prospective clinical trial that can use the simplified Berlin test together with our proprietary questions on the general population, with the aim of possibly creating a richer questionnaire with better sensitivity and specificity.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Ethics Statement

Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author Contributions

GDN, LC, RL, and LDB contributed to conception and design of the study. LC, GDN, AC, ME, and MC organized the database. LC, GDN, and EV performed the statistical analysis. LC and GDN wrote the first draft of the manuscript. LC, GDN, MA, DT, and LDB wrote sections of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

The authors would like to thank the nursing professionals who contributed to this study. In particular, we would like to thank Dr. Simone Zacchino, Dr. Angelo Benedetto, and Dr. Maria Chiara Carriero for contributing to the realization of this work.

Footnotes

1. ^https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions

2. ^https://scikit-learn.org/

References

1. Senaratna CV, Perret JL, Lodge CJ, Lowe AJ, Campbell BE, Matheson MC, et al. Prevalence of obstructive sleep apnea in the general population: a systematic review. Sleep Med Rev. (2017) 34:70–81. doi: 10.1016/j.smrv.2016.07.002

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Practice guidelines for the perioperative management of patients with obstructive sleep apnea. Anesthesiology. (2006) 104:1081–93. doi: 10.1097/00000542-200605000-00026

PubMed Abstract | CrossRef Full Text

3. Marin JM, Carrizo SJ, Vicente E, Agusti AG. Long-term cardiovascular outcomes in men with obstructive sleep apnoea-hypopnoea with or without treatment with continuous positive airway pressure: an observational study. Lancet. (2005) 365:1046–53. doi: 10.1016/S0140-6736(05)71141-7

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Dyken ME, Im KB. Obstructive sleep apnea and stroke. Chest. (2009) 136:1668–77. doi: 10.1378/chest.08-1512

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Benjafield A V, Ayas NT, Eastwood PR, Heinzer R, Ip MSM, Morrell MJ, et al. Estimation of the global prevalence and burden of obstructive sleep apnoea: a literature-based analysis. Lancet Respir Med. (2019) 7:687–98. doi: 10.1016/S2213-2600(19)30198-5

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Fietze I, Laharnar N, Obst A, Ewert R, Felix SB, Garcia C, et al. Prevalence and association analysis of obstructive sleep apnea with gender and age differences - results of SHIP-Trend. J Sleep Res. (2019) 28:e12770. doi: 10.1111/jsr.12770

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Netzer NC, Stoohs RA, Netzer CM, Clark K, Strohl KP. Using the Berlin Questionnaire to identify patients at risk for the sleep apnea syndrome. Ann Intern Med. (1999) 131:485. doi: 10.7326/0003-4819-131-7-199910050-00002

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Lo A, Chernoff H, Zheng T, Lo S-H. Why significant variables aren't automatically good predictors. Proc Natl Acad Sci. (2015) 112:13892–7. doi: 10.1073/pnas.1518285112

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Bzdok D, Engemann D, Thirion B. Inference and prediction diverge in biomedicine. Patterns. (2020) 1:100119. doi: 10.1016/j.patter.2020.100119

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Varga T V., Niss K, Estampador AC, Collin CB, Moseley PL. Association is not prediction: a landscape of confused reporting in diabetes – a systematic review. Diabetes Res Clin Pract. (2020) 170:108497. doi: 10.1016/j.diabres.2020.108497

PubMed Abstract | CrossRef Full Text | Google Scholar

11. Tan A, Yin JDC, Tan LWL, van Dam RM, Cheung YY, Lee C-H. Using the Berlin Questionnaire to predict obstructive sleep apnea in the general population. J Clin Sleep Med. (2017) 13:427–32. doi: 10.5664/jcsm.6496

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Bernhardt L, Brady EM, Freeman SC, Polmann H, Réus JC, Flores-Mir C, et al. Diagnostic accuracy of screening questionnaires for obstructive sleep apnoea in adults in different clinical cohorts: a systematic review and meta-analysis. Sleep Breath. (2021) 18:1–26. doi: 10.1007/s11325-021-02450-9

PubMed Abstract | CrossRef Full Text | Google Scholar

13. Chen T, Guestrin C. XGBoost. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY: ACM. (2016). p. 785–94.

Google Scholar

14. Jakobsen JC, Gluud C, Wetterslev J, Winkel P. When and how should multiple imputation be used for handling missing data in randomised clinical trials – a practical guide with flowcharts. BMC Med Res Methodol. (2017) 17:162. doi: 10.1186/s12874-017-0442-1

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Duda RO, Hart PE, Stork DG. Pattern Classification. Hoboken, NJ: John Wiley & Sons (2001).

Google Scholar

16. Heinzer R, Vat S, Marques-Vidal P, Marti-Soler H, Andries D, Tobback N, et al. Prevalence of sleep-disordered breathing in the general population: the HypnoLaus study. Lancet Respir Med. (2015) 3:310–8. doi: 10.1016/S2213-2600(15)00043-0

PubMed Abstract | CrossRef Full Text | Google Scholar

17. Peppard PE, Young T, Barnet JH, Palta M, Hagen EW, Hla KM. Increased prevalence of sleep-disordered breathing in adults. Am J Epidemiol. (2013) 177:1006–14. doi: 10.1093/aje/kws342

PubMed Abstract | CrossRef Full Text | Google Scholar

18. Arunsurat I, Luengyosluechakul S, Prateephoungrat K, Siripaupradist P, Khemtong S, Jamcharoensup K, et al. Simplified Berlin Questionnaire for screening of high risk for obstructive sleep apnea among Thai male healthcare workers. J UOEH. (2016) 38:199–206. doi: 10.7888/juoeh.38.199

PubMed Abstract | CrossRef Full Text | Google Scholar

19. Braley TJ, Dunietz GL, Chervin RD, Lisabeth LD, Skolarus LE, Burke JF. Recognition and diagnosis of obstructive sleep apnea in older Americans. J Am Geriatr Soc. (2018) 66:1296–302. doi: 10.1111/jgs.15372

PubMed Abstract | CrossRef Full Text | Google Scholar

20. Zamarrón C, Gude F, Otero Y, Alvarez JM, Golpe A, Rodriguez JR. Prevalence of sleep disordered breathing and sleep apnea in 50- to 70-year-old individuals. Respiration. (1999) 66:317–22. doi: 10.1159/000029401

PubMed Abstract | CrossRef Full Text | Google Scholar

21. Eikermann M, Jordan AS, Chamberlin NL, Gautam S, Wellman A, Lo Y-L, et al. The influence of aging on pharyngeal collapsibility during sleep. Chest. (2007) 131:1702–9. doi: 10.1378/chest.06-2653

PubMed Abstract | CrossRef Full Text | Google Scholar

22. Worsnop C, Kay A, Kim Y, Trinder J, Pierce R. Effect of age on sleep onset-related changes in respiratory pump and upper airway muscle function. J Appl Physiol. (2000) 88:1831–9. doi: 10.1152/jappl.2000.88.5.1831

PubMed Abstract | CrossRef Full Text | Google Scholar

23. Appleton SL, Gill TK, Lang CJ, Taylor AW, McEvoy RD, Stocks NP, et al. Prevalence and comorbidity of sleep conditions in Australian adults: 2016 Sleep Health Foundation national survey. Sleep Heal. (2018) 4:13–9. doi: 10.1016/j.sleh.2017.10.006

PubMed Abstract | CrossRef Full Text | Google Scholar

24. Hiestand DM, Britz P, Goldman M, Phillips B. Prevalence of symptoms and risk of sleep apnea in the US population. Chest. (2006) 130:780–6. doi: 10.1378/chest.130.3.780

PubMed Abstract | CrossRef Full Text | Google Scholar

25. Huang T, Lin BM, Markt SC, Stampfer MJ, Laden F, Hu FB, et al. Sex differences in the associations of obstructive sleep apnoea with epidemiological factors. Eur Respir J. (2018) 51:1702421. doi: 10.1183/13993003.02421-2017

PubMed Abstract | CrossRef Full Text | Google Scholar

26. Sunwoo J-S, Hwangbo Y, Kim W-J, Chu MK, Yun C-H, Yang KI. Prevalence, sleep characteristics, and comorbidities in a population at high risk for obstructive sleep apnea: a nationwide questionnaire study in South Korea. PLoS ONE. (2018) 13:e0193549. doi: 10.1371/journal.pone.0193549

PubMed Abstract | CrossRef Full Text | Google Scholar

27. Matsumoto T, Murase K, Tabara Y, Minami T, Kanai O, Takeyama H, et al. Sleep disordered breathing and metabolic comorbidities across sex and menopausal status in East Asians: the Nagahama Study. Eur Respir J. (2020) 56:1902251. doi: 10.1183/13993003.02251-2019

PubMed Abstract | CrossRef Full Text | Google Scholar

28. Tufik S, Santos-Silva R, Taddei JA, Bittencourt LRA. Obstructive sleep apnea syndrome in the Sao-Paulo epidemiologic sleep study. Sleep Med. (2010) 11:441–6. doi: 10.1016/j.sleep.2009.10.005

PubMed Abstract | CrossRef Full Text | Google Scholar

29. Young T. Risk factors for obstructive sleep apnea in adults. JAMA. (2004) 291:2013. doi: 10.1001/jama.291.16.2013

PubMed Abstract | CrossRef Full Text | Google Scholar

30. Lin YN, Zhou LN, Zhang XJ, Li QY, Wang Q, Xu HJ. Combined effect of obstructive sleep apnea and chronic smoking on cognitive impairment. Sleep Breath. (2016) 20:51–9. doi: 10.1007/s11325-015-1183-1

PubMed Abstract | CrossRef Full Text | Google Scholar

31. Brevi BC, Toma L, Magri AS, Sesenna E. Use of the mandibular distraction technique to treat obstructive sleep apnea syndrome. J Oral Maxillofac Surg. (2011) 69:566–71. doi: 10.1016/j.joms.2010.09.007

PubMed Abstract | CrossRef Full Text | Google Scholar

32. Kario K. Obstructive sleep apnea syndrome and hypertension: mechanism of the linkage and 24-h blood pressure control. Hypertens Res. (2009) 32:537–41. doi: 10.1038/hr.2009.73

PubMed Abstract | CrossRef Full Text | Google Scholar

33. Chobanian AV, Bakris GL, Black HR, Cushman WC, Green LA, Izzo JL, et al. Seventh report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure. Hypertens. (2003) 42:1206–52. doi: 10.1161/01.HYP.0000107251.49515.c2

PubMed Abstract | CrossRef Full Text | Google Scholar

34. Börgel J, Springer S, Ghafoor J, Arndt D, Duchna H-W, Barthel A, et al. Unrecognized secondary causes of hypertension in patients with hypertensive urgency/emergency: prevalence and co-prevalence. Clin Res Cardiol. (2010) 99:499–506 doi: 10.1007/s00392-010-0148-4

PubMed Abstract | CrossRef Full Text | Google Scholar

35. Gonçalves SC, Martinez D, Gus M, de Abreu-Silva EO, Bertoluci C, Dutra I, et al. Obstructive sleep apnea and resistant hypertension. Chest. (2007) 132:1858–62. doi: 10.1378/chest.07-1170

PubMed Abstract | CrossRef Full Text | Google Scholar

36. Hou H, Zhao Y, Yu W, Dong H, Xue X, Ding J, et al. Association of obstructive sleep apnea with hypertension: a systematic review and meta-analysis. J Glob Health. (2018) 8:010405. doi: 10.7189/jogh.08.010405

PubMed Abstract | CrossRef Full Text | Google Scholar

37. Peppard PE, Young T, Palta M, Skatrud J. Prospective study of the association between sleep-disordered breathing and hypertension. N Engl J Med. (2000) 342:1378–84. doi: 10.1056/NEJM200005113421901

PubMed Abstract | CrossRef Full Text | Google Scholar

38. Julien JY, Martin JG, Ernst P, Olivenstein R, Hamid Q, Lemière C, et al. Prevalence of obstructive sleep apnea–hypopnea in severe versus moderate asthma. J Allergy Clin Immunol. (2009) 124:371–6. doi: 10.1016/j.jaci.2009.05.016

PubMed Abstract | CrossRef Full Text | Google Scholar

39. Yigla M, Tov N, Solomonov A, Rubin AE, Harlev D. Difficult-to-control asthma and obstructive sleep apnea. J Asthma. (2003) 40:865–71. doi: 10.1081/JAS-120023577

PubMed Abstract | CrossRef Full Text | Google Scholar

40. Teodorescu M, Broytman O, Curran-Everett D, Sorkness RL, Crisafi G, Bleecker ER, et al. Obstructive sleep apnea risk, asthma burden, and lower airway inflammation in adults in the severe asthma research program (SARP) II. J Allergy Clin Immunol Pract. (2015) 3:566–75.e1. doi: 10.1016/j.jaip.2015.04.002

PubMed Abstract | CrossRef Full Text | Google Scholar

41. D'Cruz RF, Murphy PB, Kaltsakas G. Sleep disordered breathing and chronic obstructive pulmonary disease: a narrative review on classification, pathophysiology and clinical outcomes. J Thorac Dis. (2020) 12:S202–16. doi: 10.21037/jtd-cus-2020-006

PubMed Abstract | CrossRef Full Text | Google Scholar

42. Edwards C, Almeida OP, Ford AH. Obstructive sleep apnea and depression: a systematic review and meta-analysis. Maturitas. (2020) 142:45–54. doi: 10.1016/j.maturitas.2020.06.002

PubMed Abstract | CrossRef Full Text | Google Scholar

43. Aftab Z, Anthony AT, Rahmat S, Sangle P, Khan S. An updated review on the relationship of depressive symptoms in obstructive sleep apnea and continuous positive airway pressure. Cureus. (2021) 13:e15907. doi: 10.7759/cureus.15907

PubMed Abstract | CrossRef Full Text | Google Scholar

44. Chen Y-H, Keller JK, Kang J-H, Hsieh H-J, Lin H-C. Obstructive sleep apnea and the subsequent risk of depressive disorder: a population-based follow-up study. J Clin Sleep Med. (2013) 9:417–23. doi: 10.5664/jcsm.2652

PubMed Abstract | CrossRef Full Text | Google Scholar

45. Peppard PE, Szklo-Coxe M, Hla KM, Young T. Longitudinal association of sleep-related breathing disorder and depression. Arch Intern Med. (2006) 166:1709–15. doi: 10.1001/archinte.166.16.1709

PubMed Abstract | CrossRef Full Text | Google Scholar

46. Roure N, Gomez S, Mediano O, Duran J, Peña M de la, Capote F, et al. Daytime sleepiness and polysomnography in obstructive sleep apnea patients. Sleep Med. (2008) 9:727–31. doi: 10.1016/j.sleep.2008.02.006

PubMed Abstract | CrossRef Full Text | Google Scholar

47. Sforza E, Saint Martin M, Barthélémy JC, Roche F. Mood disorders in healthy elderly with obstructive sleep apnea: a gender effect. Sleep Med. (2016) 19:57–62. doi: 10.1016/j.sleep.2015.11.007

PubMed Abstract | CrossRef Full Text | Google Scholar

48. Saint Martin M, Sforza E, Roche F, Barthélémy JC, Thomas-Anterion C. Sleep breathing disorders and cognitive function in the elderly: an 8-year follow-up study. The Proof-Synapse Cohort. Sleep. (2015) 38:179–87. doi: 10.5665/sleep.4392

PubMed Abstract | CrossRef Full Text | Google Scholar

49. Kato K, Noda A, Yasuma F, Matsubara Y, Miyata S, Iwamoto K, et al. Effects of sleep-disordered breathing and hypertension on cognitive function in elderly adults. Clin Exp Hypertens. (2020) 42:250–6. doi: 10.1080/10641963.2019.1632338

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: obstructive sleep apnea (OSA), Berlin questionnaire (BQ), risk factors, machine learning, simplified berlin questionnaire, screening test

Citation: De Nunzio G, Conte L, Lupo R, Vitale E, Calabrò A, Ercolani M, Carvello M, Arigliani M, Toraldo DM and De Benedetto L (2022) A New Berlin Questionnaire Simplified by Machine Learning Techniques in a Population of Italian Healthcare Workers to Highlight the Suspicion of Obstructive Sleep Apnea. Front. Med. 9:866822. doi: 10.3389/fmed.2022.866822

Received: 18 March 2022; Accepted: 28 March 2022;
Published: 25 May 2022.

Edited by:

Barbara Ruaro, University of Trieste, Italy

Reviewed by:

Riccardo Pozzan, University of Trieste, Italy
Romeo Martini, University Hospital of Padua, Italy

Copyright © 2022 De Nunzio, Conte, Lupo, Vitale, Calabrò, Ercolani, Carvello, Arigliani, Toraldo and De Benedetto. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Giorgio De Nunzio, Z2lvcmdpby5kZW51bnppb0B1bmlzYWxlbnRvLml0

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.