- 1Department of Medicine, University of Manitoba, Winnipeg, MB, Canada
- 2Department of Anesthesia, University of Toronto, Toronto, ON, Canada
- 3Manitoba Centre for Health Policy, University of Manitoba, Winnipeg, MB, Canada
- 4Research School of Population Health, Australian National University, Canberra, ACT, Australia
Background: Prediction of future critical illness could render it practical to test interventions seeking to avoid or delay the coming event.
Objective: Identify adults having >33% probability of near-future critical illness.
Research Design: Retrospective cohort study, 2013–2015.
Subjects: Community-dwelling residents of Manitoba, Canada, aged 40–89 years.
Measures: The outcome was a near-future critical illness, defined as intensive care unit admission with invasive mechanical ventilation, or non-palliative death occurring 30–180 days after 1 April each year. By dividing the data into training and test cohorts, a Classification and Regression Tree analysis was used to identify subgroups with ≥33% probability of the outcome. We considered 72 predictors including sociodemographics, chronic conditions, frailty, and health care utilization. Sensitivity analysis used logistic regression methods.
Results: Approximately 0.38% of each yearly cohort experienced near-future critical illness. The optimal Tree identified 2,644 mutually exclusive subgroups. Socioeconomic status was the most influential variable, followed by nursing home residency and frailty; age was sixth. In the training data, the model performed well; 41 subgroups containing 493 subjects had ≥33% members who developed the outcome. However, in the test data, those subgroups contained 429 individuals, with 20 (4.7%) experiencing the outcome, which comprised 0.98% of all subjects with the outcome. While logistic regression showed less model overfitting, it likewise failed to achieve the stated objective.
Conclusions: High-fidelity prediction of near-future critical illness among community-dwelling adults was not successful using population-based administrative data. Additional research is needed to ascertain whether the inclusion of additional types of data can achieve this goal.
Introduction
The care of critically ill people in intensive care units (ICUs) is an important part of healthcare in all industrialized countries. Approximately 0.5–1.3% of all adults are admitted to ICUs every year, which is rising rapidly with age and amounting to 2–5% of people 85 years of age or older (1, 2). In the United States, up to half of all people experience ICU care during their final year of life (3), and many die there (4, 5). In Canada, 11% of hospitalizations include time in an ICU (6) and 19% of people die there (7). Estimates from the United States indicate that ICU care comprises ~4% of total national health expenditures (8), equating to 0.7% of the national gross domestic product (9). Furthermore, ICU utilization is also rising (6, 10). Critical illnesses cause burdens for society by inhibiting the ability of survivors to work and earn (11, 12).
The risk of death from critical illness is high, but mortality is only one of its negative consequences. Many survivors experience ongoing physical, cognitive, and psychological problems (13). It would be a major advance to prevent or delay critical illness among community-dwelling adults. Prospectively identifying adults with a high probability of developing critical illness in the near future is a necessary first step toward designing and testing interventions to achieve this advance. In this work, we specifically sought to identify adults with a >33% probability of near-future critical illness.
For the maximum value, the prediction of critical illness needs to be feasible using readily accessible data that are population-based and available on an ongoing basis. Administrative (health claims) data meet these criteria (14). While previous studies have attempted similar predictions, they have met with limited success (15–20) We hypothesized that applying advanced statistical methods to longitudinal information about medical resource utilization, coupled with information about demographics and serious health conditions, in a pre-COVID-19 era would: (a) identify subgroups with high a probability of near-future critical illness and (b) identify a consequential fraction of all people who develop that outcome.
Materials and Methods
Design, Setting, and Data Sources
This retrospective cohort study used administrative health data from the universal, single-payer healthcare system in the Canadian province of Manitoba. Available to all Manitoba residents, this system covers inpatient and outpatient care, practitioner fees, diagnostic testing, long-term care, and homecare. There is limited coverage for outpatient eye examinations, chiropractor, and physical therapy visits. An outpatient prescription drug benefit plan with an income-related deductible is available to low-income registrants. Services not covered include outpatient care by dentists, podiatrists, acupuncturists, psychologists, and dietitians; cosmetic surgery; and ambulance transport, with the exception of air ambulance transport for residents who live north of the 53rd parallel.
The databases used for this study (Supplementary Table 1) are held in the Manitoba Centre for Health Policy Research Data Repository (21). As previously described (22–24), they are linked via an anonymized version of the unique Personal Health Identification Number. New data are updated every 6 months, routinely cleaned, and checked. These data have been demonstrated to have high validity and reliability for investigating health and the use of healthcare (22).
The Discharge Abstract Database (DAD) captures detailed data for every hospitalization, including admission and discharge dates, up to 25 diagnoses reported using the International Classification of Disease (ICD)-10th edition Canadian format, and up to 15 procedures in the Canadian Classification of Interventions (CCI) format (25–27). Centrally trained data abstractors working in each acute care hospital collect these data using nationally uniform definitions, format, collection methods, and data entry software (28). DAD data are validated and reported to the Canadian Institute for Health Information by the provincial health authority. The DAD is highly accurate in identifying the delivery and timing of ICU care (29).
This study was approved by the University of Manitoba Health Research Ethics Board and Manitoba's Health Information Privacy Committee.
Study Population
The source for this study was the Manitoba population (30). We included three fiscal year cohorts (FY2013–2015, each from April 1 to May 30). April 1 was the start date of each FY at which inclusion and exclusion criteria were applied. We included individuals aged 40–89 years who were continuously registered with Manitoba Health from 5 years before the start date to the previous 1 year after the start date or the critical illness date, if it occurred.
We excluded individuals who had incident malignancies in the 5-year period preceding the start date; those who were in an acute care facility on the start date; or were enrolled in a palliative care program anytime during the 2 years preceding the start date. The rationale for excluding individuals with incident cancers derives from the fact that since ICU admission and death due to cancers are common (31, 32) and undiagnosed cancers are rare (33, 34), critical illness or death from cancer is unlikely to be avoidable. The 5-year interval is a common benchmark for cancer survival. Generalizing the finding of Lix et al. (35), we identified incident malignancy based on the presence of at least one inpatient or outpatient diagnosis code occurring within 5 years before the start date, and for which no other cancer diagnosis codes were identified during the 5–10 years prior to that code. We used accepted diagnosis codes [Supplementary Table 2; (36)]. We excluded individuals in an acute care facility because our goal was to identify individuals residing in the community who were presumably medically stable when they develop the outcome. Individuals enrolled in palliative care programs were excluded because they have a short life expectancy and would not seek aggressive and curative medical care at the end of life.
Outcome
Our outcome was a critical illness that occurred in the near future, defined as 30–180 days after the start date. Thirty days was chosen as the lower limit as it would require some time to locate, contact, and engage the individual in an intervention seeking to avoid the adverse outcome. The 180-day upper limit provides sufficient time for outcomes to occur, while expecting that the ability to predict such future events would degrade with the passage of time after the start date.
Following prior work, critical illness was defined as the presence of either of the following events: (i) non-elective hospital admission that included care in a high-intensity ICU with the use of artificial life support, or (ii) non-palliative death, in or out of the hospital (18, 37, 38). For non-elective admissions, we excluded hospitalizations for trauma or injury (Supplementary Table 2), as they are unforeseen and expected to be much more difficult to predict or prevent. The critical illness date was taken as the earlier of the two events within the 30- to 180-day interval. High-intensity ICUs are those capable of providing artificial life support for an unlimited period. During the study period, Manitoba had 10 such adult ICUs serving its population of 1.3 million (30).
Though we sought to include ICU admissions involving the use of any of the three most common types of artificial life support (invasive mechanical ventilation (IMV), intravenous vasoactive drugs, or renal replacement therapies), DAD coding has proved only sufficiently accurate for invasive mechanical ventilation (39). However, 81% of ICU patients in our cohort who received vasoactive drugs or renal replacement therapies were also mechanically ventilated (39).
Invasive mechanical ventilation was identified by CCI procedure codes (Supplementary Table 2). Enrollment in palliative care was defined as any of the following being present in the 2 years before the start date: (i) in palliative care in any Manitoba hospital, identified by the presence of hospital diagnosis coding (Supplementary Table 2); (ii) DAD service codes indicating primary responsibility for hospital care under the palliative care service; (iii) outpatient palliative care identified by palliative care codes in the provincial Home Care database; or (iv) outpatient pharmacy database, indicating medication payment under the provincial palliative care program.
Analysis
Our primary analysis used Classification and Regression Trees (CART) (40, 41), seeking to identify subgroups of community-dwelling adults who experienced high rates of critical illness 30–180 days after the start date (Appendix A). CART divides a cohort into mutually exclusive subgroups, each defined by a given value/category of each input variable. The result is a ramified tree where each “terminal leaf” includes one homogeneous subgroup. Using CART to identify such individuals amounts to identifying terminal leaves in which a sufficiently high fraction of included persons experience the outcome. We chose 33% as being a sufficiently high fraction as it represents needing to intervene on three people to have a chance of avoiding one outcome.
To create our CART model, we used data from FY2013 and 2014 as the Training data, randomly splitting the data (60:40) into two subcohorts, which were used to train the model. Subsequently, we assessed this model on the FY2015 cohort (Test data). We report the relative influence of each predictor variable in the final tree, calculated such that the top-ranked predictor variable is assigned a value of 1.0 (42). To evaluate predictive ability, we report LIFT (43), defined as the fraction of outcome events in the subgroup(s) divided by the fraction in the originating population. See Appendix A for more information, including the CART settings used.
In a sensitivity analysis, we assessed the performance of logistic regression for predicting the outcome, combining FY2013 and 2014 data for model development, and then applying that model to FY2015 data. All independent variables (next section) were included. Given the low fraction of outcomes, we used Firth's method of bias correction.
For the comparison of parameters between groups, t-test, χ2 test, or Fisher's exact test were used, as appropriate. All analyses were performed using SAS version 9.2 and SAS Enterprise Miner version 13.5 (SAS Institute Inc., Cary, NC).
Predictive Factors
We included 72 parameters (included as 93 input variables) encompassing measures of sociodemographics, chronic comorbid conditions, frailty, and prior health care use (Supplementary Table 3). Sociodemographic variables were age, sex, residing in a nursing home, awaiting placement in a nursing home, rurality of living status [assessed by Statistical Area Classification (44)], straight-line distance from residence location to the nearest high-intensity ICU, socioeconomic status [assessed by an area-level measure, the Socioeconomic Factor Index-2 (SEFI-2), where higher values represent lower socioeconomic status (45)], and having ever received public income assistance. Standard coding was used to identify 32 chronic, comorbid conditions (36). Three administrative data measures of frailty were included (46–48).
A motivating concept of this work was that substantial additional power for predicting near-future health events would be derived from longitudinal medical resource use data. For example, over and above the existence of chronic conditions, a pattern of the rapidly rising use of medical resources might indicate a higher risk of near-future critical illness. We, therefore, included longitudinal information about the use of six types of medical care: (i) number of classes of prescription medications dispensed, (ii) hospital days, (iii) days in Alternative Level of Care [awaiting long-term placement] and rehabilitation facilities, (iv) outpatient visits, (v) outpatient laboratory tests performed, and (vi) separate days in which the individual made one or more calls to Manitoba Health Links, a phone-based system available around-the-clock, where registered nurses follow assessment guidelines to triage of health issues (49). We originally planned to identify trajectories of utilization via group-based methods (50); however, in our very large cohorts, it proved unable to identify subject subsets which were substantial in absolute numbers, but represented small fractions of the cohort (e.g., <2%, representing 10,000 people within a yearly cohort). Therefore, for each of the six measures, we instead included counts during each of four intervals before the start date: (A) 13–24, (B) 5–12, (C) 4–6, and (D) 0–3 months prior. Although this approach does not explicitly include patterns of use, CART can include counts from different intervals to relate the outcome to temporal patterns of resource use, if present.
Finally, we included the most recent use of intensive care, and three common, invasive diagnostic procedures (cardiac catheterization, upper or lower gastrointestinal endoscopy, bronchoscopy) prior to the start date. These were classified as: 0–1, 2–6, 7–12, 13–24, or >24 months prior to the start date.
Results
Study Populations
Approximately 536,000 individuals comprised each of the 3 yearly cohorts (Table 1; Supplementary Table 4). In all three, 0.38% of individuals experienced the outcome. Each CART input variable differed between those who did vs. those who did not experience the outcome, in terms of statistical significance and absolute terms. People with the outcome were 2–9 times more likely to have had ICU care, cardiac catheterization, GI endoscopy, and bronchoscopy within the 1 month before the start date. They were 10–21 times more likely to live in a personal care home or to have an open homecare file. They were over ~2.5-fold more likely to have frailty scores in the highest tercile. In the 3 months prior to the start dates, people with the outcome had, on average, 1.5 more hospital days, 0.8 more outpatient visits, 0.8 more outpatient laboratory tests, and filled prescriptions for 3.4 additional classes of drugs than individuals without the outcome.
Table 1. Selected characteristics of the datasets used for analysis (see Supplementary Table 4 for a complete list).
CART Analysis
The final optimal tree had 30 levels of branching and 2,644 terminal leaves (Supplementary Table 5). The initial branch point was by residence in a nursing home. All subjects residing in nursing homes as of the start date were included in a single terminal leaf; the larger (60%) subcohort of the Training data contained 5,954 subjects, of whom 470 (7.9%) experienced the outcome. Appendix B contains an example of how CART can combine input variables in complex combinations.
The input variable with the highest predictive value was socioeconomic status, followed by living in a nursing home (Table 2; Supplementary Table 6). The Segal and McIsaac frailty measures occupied the third and fifth slots, and age was sixth. Utilization of outpatient care and drug prescriptions were the highest-ranked parameters of medical resource use; though generally counts further back in time from the start date were more influential than were those that were closer to the start date. The first appearance of a count of hospital days is in the 14th slot, with relative importance less than half that of socioeconomic status. Among the 32 specific chronic diagnoses, all had importance values <0.29 on this relative scale ranging from 0 to 1.
Table 2. Relative predictive value of top 25 variables in the optimal Classification and Regression Tree solution (see Supplementary Table 6 for a complete list).
In the Training data, the optimal tree performed well in identifying individuals with the outcome (Table 3); 493 subjects contained in 41 terminal leaves each had ≥33% of its members with the outcome. However, most of this performance in predicting near-future critical illness represented overfitting of the model to the Training data, as this performance was not reproduced when applying the same terminal leaf definitions to the Test data (Table 3). In the Test data, these 41 leaves contained 429 individuals, but only 20 (4.7%) of them had the outcome, representing 0.98% of all those with the outcome. Expanding the range of terminal leaves in the Training data to those with ≥ 20% or ≥10% outcomes likewise performed well in the Training data, but this was not reproduced in the Test data (Table 3).
Table 3. Performance of the optimal Classification and Regression Tree in identifying individuals with the outcome.
Sensitivity Analysis
In the sensitivity analysis, unlike CART, logistic regression modeling performed similarly in the Training and Test data (Table 4). Although the Test data logistic modeling correctly identified a larger percentage of those flagged as having ≥33% probability of the outcome (20.5 vs. 4.7% from CART), it identified a similarly low percentage of all those with the outcome (1.1 vs. 0.98% for CART).
Discussion
High-fidelity prediction of a substantial fraction of persons experiencing near-future critical illness was not possible using administrative healthcare data alone. Specifically, we did not succeed in prospectively identifying a substantial number of individuals belonging to subgroups of community-dwelling Manitobans with a ≥33% probability of developing critical illness in the following 6 months. We chose the 33% threshold to make it practical to design and test specially designed interventions seeking to avoid or delay the coming health event, assuming that these interventions would be resource-intensive. However, for individuals in those subgroups in our future (Test) data, that parameter was 4.7% and not 33%, and comprised ~1% of all those with the outcome. While applying logistic regression to these administrative data showed less overfitting compared to CART, it likewise failed to achieve the stated objective.
Two prior efforts sought to predict future critical illness among unselected, community-dwelling persons (16, 18). Neither were population-based; both used logistic regression with fewer input variables than our study. Among 4.7 million health plan enrollees in a validation cohort (16), 0.75% experienced ICU admission within the following 1 year. Among the 1% of subjects with the highest predicted risk, 35% experienced the outcome, though this represented only 0.49% of all those with the outcome. In comparison, 0.38% of our validation cohort experienced our outcome within 180 days, and among those with predicted risk exceeding 33%, 4.7% experienced the outcome, representing 0.98% of all those with the outcome. In what was evidently a very different substrate, among 9,742 people 65 years and older attending Mayo Clinic outpatient clinics, 8.8% in the cohort experienced critical illness within the following 2 years, and among the 11% with the highest risk score, 26% experienced the outcome, which was 33% of those with the outcome (18). Other studies have used regression methods but for different goals, including attempts to predict future critical illness among patients brought to hospital via ambulance, hospitalization and/or death among community-dwelling persons, and future need for mechanical ventilation among community-dwelling persons (15, 17, 37, 51). It is important to note that although efforts to identify people at high risk of outcomes, such as future critical illness or death have reported good results using the c-statistic as the metric (52), the c-statistic failure is inappropriate for a purpose such as ours because it fails to account for the underlying prevalence of the disorder of interest (53).
Potential methodologic limitations deserve discussion. We included numerous input variables representing a wide variety of concepts related to health and health care, including the novel aspect of incorporating prior medical resources in a way that allowed for accounting for trajectories of use. We did not include other administrative data such as immunizations, immigrant status, education, Emergency Department visits, or results of historical laboratory tests. While such additional information could plausibly add predictive power, it added only a small increment in an analysis of 1-year mortality among hospitalized patients (20). Second, we used CART analysis, a flexible and powerful statistical method that allows for arbitrarily complicated interactions among the input variables. Sensitivity analysis using logistic regression modeling likewise failed to achieve our goal. While it is possible that another machine-learning method might perform better, direct comparisons across a variety of clinical areas have not found any method to be consistently superior (54–58) Furthermore, a recent systematic review directly comparing machine learning methods to logistic regression reported no significant differences in predictive performance among studies with methodology at low risk of bias (59). Third, our choice of 30–180 days forward from the start date as constituting the “near future” was chosen a priori, but could be questioned. Fourth, we chose a composite outcome that included non-palliative, non-ICU death. Recalling that we sought to identify critical illness that could be anticipated and possibly delayed or avoided, this composite derives from the following: (i) the idea that any death is associated with critical illness, even if that illness was very brief, by recognizing that if such a person had been close to death at the time of discovery, rather than being dead, they might have survived long enough to be admitted to an ICU; and (ii) including them helps address the facts that economically disadvantaged persons and those in remote communities have less access to timely care, causing higher rates of prehospital death. This concept has been previously used in assessing disparities in access to ICU care and found to demonstrate reassuring face validity (38). Previous studies have also included death as part of critical illness in prediction efforts (18, 37), though they did not distinguish between palliative and non-palliative deaths. A limitation of this concept is the inability to identify individuals who do not desire or receive ICU admission when they become critically ill but lack formal identification of palliative care. This describes many residents of nursing homes, who have standing Do Not Resuscitate orders but are not enrolled in formal palliative care programs. It would be best to identify such individuals and exclude them from our cohort; however, our data do not contain the information needed to do so. Their inclusion likely introduced misclassification in our outcome, potentially reducing the performance of our predictive model. Fifth, slight differences in predictive performance may have occurred by limiting critical illness onset from April 1 to September 30. Sixth, we excluded patients hospitalized for trauma; however, as we had no direct information about prehospital trauma deaths, we were unable to exclude them from our cohort. Combining Canadian age- and cause-specific death data (32, 60) with the knowledge that 51% of trauma deaths in our included age group occur prehospital (61), we estimate 119 such deaths yearly, indicating a 5.8% overestimation of the number of yearly outcomes experienced in our cohorts. Finally, an explanation for reporting on older data is provided in Appendix C.
The predictive importance of frailty was notable. Frailty may be defined as a “syndrome of age-related physiological decline, characterized by marked vulnerability to adverse health outcomes” (62); it is associated with mortality and morbidity, and with a reduced ability to benefit from aggressive medical interventions. The two predominant formal ways of measuring frailty relate to functioning (63, 64). As administrative data does not contain such information, claims-based frailty measures utilize surrogate parameters and/or lists of comorbid conditions (46–48). In light of this limitation, we chose to include three different administrative data measures of frailty. In our analysis, the frailty administrative data definitions of Segal et al. (47) and McIsaac et al. (48) were among the five most influential variables, indicating some non-overlap between what they are capturing. That they had relative importance almost 3-fold higher than even the most influential specific chronic condition (metastatic cancer) suggests, as have some prior findings (65), that much of the influence of chronic conditions on future outcome may be mediated by the frailty they cause, rather than the condition per se.
Although longitudinal measures of medical resource use were influential input variables for predicting the outcome, it was generally not their most recent values that were most important. This may indicate that our outcome relates more to longer-term processes than recent/sudden changes, and it may, in part, explain the poor performance of prior attempts to predict future clinical outcomes based primarily on recent data (18, 37, 66).
We are led to a potentially important hypothesis from the failure of studies including ours and the others mentioned above (16, 18, 20), to accurately predict future medical needs or outcomes. That hypothesis is that high-fidelity prediction, if possible at all, will require the inclusion of input parameters that tap into different types of information than do administrative and clinical data; these may include innate biologies such as genetics and epigenetics, health behaviors, environmental exposures, and other socioeconomic factors. We conclude that to achieve high fidelity prediction of future critical illness, it is necessary to go back to the basics and develop a stronger conceptual framework to help identify the full range of variables that might be influential and to determine how they may be routinely captured at the population level.
Data Availability Statement
The datasets presented in this article were derived from administrative health data as a secondary use. The data custodian is Manitoba Health and Seniors Care, and were provided under specific data sharing agreements only approved for use at the Manitoba Centre for Health Policy. Where necessary, source data specific to this article or project may be reviewed at the Manitoba Centre for Health Policy with the consent of the original data providers along with the required privacy and ethical review bodies. Requests to access these datasets should be directed to Charles Burchill, Y2hhcmxlc19idXJjaGlsbCYjeDAwMDQwO2NwZS51bWFuaXRvYmEuY2E=.
Ethics Statement
This study was approved by the University of Manitoba Health Research Ethics Board and Manitoba's Health Information Privacy Committee. Written informed consent from the patients was not required as it entirely used existing, de-identified data.
Author Contributions
Conceptualization and funding acquisition: AG. Data curation: AG, MY, and DC. Methodology, writing, review, editing, final approval, and formal analysis: all authors.
Funding
This work was supported by Manitoba Health and Seniors Care, which had no role in study design, analysis, data interpretation, writing, or the decision to submit. Manitoba Health is the provincial health authority and is the custodian of the data used for this study.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher's Note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Acknowledgments
The authors acknowledge the Manitoba Centre for Health Policy for use of data contained in the Manitoba Population Research Data Repository under project # 2016-038 (HIPC# 2016/2017 – 47). The results and conclusions are those of the authors and no official endorsement by the Manitoba Centre for Health Policy, Manitoba Health, and Seniors Care, or other data providers is intended or should be inferred. Data used in this study were derived from data provided by Manitoba Health and Seniors Care, and the Winnipeg Regional Health Authority. This contribution derives from a report commissioned and funded by the government of the Canadian province of Manitoba, which is available at: http://mchp-appserv.cpe.umanitoba.ca/reference/ICU_Report_Web.pdf. That report was not peer-reviewed, and is required to be publicly available as a result of the project having been government-funded.
Supplementary Material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fepid.2022.944216/full#supplementary-material
References
1. Seferian E, Afessa B. Demographic and clinical variation of adult intensive care unit utilization from a geographically defined population. Crit Care Med. (2006) 34:2113–9. doi: 10.1097/01.CCM.0000227652.08185.A4
2. Garland A, Olafson K, Ramsey C, Yogendran M, Fransoo R. Epidemiology of critically ill patients in intensive care units: a population-based observational study. Critical Care. (2013) 17:R212. doi: 10.1186/cc13026
3. Barnato AE, McClellan MB, Kagay CR, Garber AM. Trends in inpatient treatment intensity among medicare beneficiaries at the end of life. Health Serv Res. (2004) 39:363–75. doi: 10.1111/j.1475-6773.2004.00232.x
4. Deaths by Place of Death Age Race and Sex: United States 1999-2002. National Center for Health Statistics, Centers for Disease Control (2002). Available online at: http://www.cdc.gov/nchs/data/dvs/mortfinal2002_work309.pdf (accessed February 21, 2002).
5. Angus D, Barnato A, Linde-Zwirble W, Weissfeld L, Watson R, Rickert T, et al. Use of intensive care at the end of life in the United States: an epidemiologic study. Crit Care Med. (2004) 32:638–43. doi: 10.1097/01.CCM.0000114816.62331.08
6. Care in Canadian ICUs. Ottawa, ON: Canadian Institute for Health Information (2016). Available online at: https://secure.cihi.ca/free_products/ICU_Report_EN.pdf (accessed March 17, 2019).
7. Heyland DK, Lavery JV, Tranmer TE, Shortt SE, Taylor SJ. Dying in Canada: is it an institutionalized, technologically supported experience? J Palliative Care. (2000) 16:S10–6. doi: 10.1177/082585970001601S04
8. Pastores S, Dakwar J, Halpern N. Costs of critical care medicine. Crit Care Clin. (2012) 28:1–10. doi: 10.1016/j.ccc.2011.10.003
9. Halpern NA, Goldman DA, Tan KS, Pastores SM. Trends in critical care beds and use among population groups and medicare and medicaid beneficiaries in the United States: 2000-2010. Crit Care Med. (2016) 44:1490–9. doi: 10.1097/CCM.0000000000001722
10. Halpern NA, Pastores SM. Critical care medicine in the United States 2000–2005: an analysis of bed numbers, occupancy rates, payer mix, and costs. Crit Care Med. (2010) 38:65–71. doi: 10.1097/CCM.0b013e3181b090d0
11. Garland A, Jeon S, Stepner M, Rotermann M, Fransoo R, Wunsch H, et al. Effects of cardiovascular and cerebrovascular health events on work and earnings: a population-based, retrospective cohort study. CMAJ. (2019) 191:E3–10. doi: 10.1503/cmaj.181238
12. Haas B, Jeon S, Rotermann M, Stepner M, Fransoo R, Sanmartin C, et al. Association of severe trauma with work and earnings in a national cohort in Canada. JAMA Surg. (2021) 156:51–9. doi: 10.1001/jamasurg.2020.4599
13. Needham DM, Davidson J, Cohen H, Hopkins RO, Weinert C, Wunsch H, et al. Improving long-term outcomes after discharge from intensive care unit: report from a stakeholders' conference. Crit Care Med. (2012) 40:502–9. doi: 10.1097/CCM.0b013e318232da75
14. Garland A, Gershengorn HB, Marrie RA, Reider N, Wilcox ME. A practical, global perspective on using administrative data to conduct ICU research. Ann Am Thorac Soc. (2015) 12:1373–86. doi: 10.1513/AnnalsATS.201503-136FR
15. Inouye SK, Zhang Y, Jones RN, Shi P, Cupples LA, Calderon HN, et al. Risk factors for hospitalization among community-dwelling primary care older patients development and validation of a predictive model. Med Care. (2008) 46:726–31. doi: 10.1097/MLR.0b013e3181649426
16. Lemke KW, Weiner JP, Clark JM. Development and validation of a model for predicting inpatient hospitalization. Med Care. (2012) 50:131–9. doi: 10.1097/MLR.0b013e3182353ceb
17. Louis DZ, Robeson M, McAna J, Maio V, Keith SW, Liu M, et al. Predicting risk of hospitalisation or death: a retrospective population-based analysis. BMJ Open. (2014) 4:e005223. doi: 10.1136/bmjopen-2014-005223
18. Biehl M, Takahashi P, Cha S, Chaudhry R, Gajic O, Thorsteinsdottir B. Prediction of critical illness in elderly outpatients using elder risk assessment: a population-based study. Clin Interv Aging. (2016) 11:829–34. doi: 10.2147/CIA.S99419
19. Einav L, Finkelstein A, Mullainathan S, Obermeyer Z. Predictive modeling of U.S. health care spending in late life. Science. (2018) 360:1462–5. doi: 10.1126/science.aar5045
20. Zeltzer D, Balicer RD, Shir T, Flaks-Manov N, Einav L, Shadmi E. Prediction accuracy with electronic medical records versus administrative claims. Med Care. (2019) 57:551–9. doi: 10.1097/MLR.0000000000001135
21. Population Health Research Data Repository. Winnipeg, MB: Manitoba Centre for Health Policy (2020). Available online at: http://umanitoba.ca/faculties/medicine/units/community_health_sciences/departmental_units/mchp/protocol/media/Repository_circles.pdf (accessed July 11, 2020).
22. Roos LL, Gupta S, Soodeen RA, Jebamani L. Data quality in an information-rich environment: Canada as an example. Can J Aging. (2005) 24:153–70. doi: 10.1353/cja.2005.0055
23. Roos LL, Brownell M, Lix L, Roos NP, Walld R, MacWilliam L. From health research to social research: privacy, methods, approaches. Soc Sci Med. (2008) 66:117–29. doi: 10.1016/j.socscimed.2007.08.017
24. Roos NP, Roos LL, Brownell M, Fuller EL. Enhancing policymakers' understanding of disparities: relevant data from an information-rich environment. Milbank Memorial Fund Quarterly. (2010) 88:382–403. doi: 10.1111/j.1468-0009.2010.00604.x
25. ICD-10-CA International Statistical Classification of Diseases Related Health Problems Tenth, Revision, Canada; Volume One - Tabular, List. Ottawa, ON: Canadian Institute for Health Information (2015). Available online at: https://secure.cihi.ca/estore/productFamily.htm?pf=PFC3971andlang=enandmedia=0 (accessed August 18, 2019).
26. CCI: Canadian Classification of Health Interventions 2015; Volume Three - Tabular List. Ottawa, ON: Canadian Institute for Health Information (2015). Available online at: https://secure.cihi.ca/estore/productFamily.htm?pf=PFC3971andlang=enandmedia=0 (accessed August 18, 2019).
27. Canadian Coding Standards for Version 2018 ICD-10-CA and CCI. Ottawa, ON: Canadian Insitute for Health Information (2018). Available online at: https://secure.cihi.ca/free_products/CodingStandards_v2018_EN.pdf (accessed August 18, 2019).
28. Discharge Abstract Database metadata (DAD). Ottawa, ON: Canadian Institute for Health Information (2020). Available online at: https://www.cihi.ca/en/discharge-abstract-database-metadata-dad (accessed August 17, 2020).
29. Garland A, Yogendran M, Olafson K, Scales DC, McGowan K-L, Fransoo R. The accuracy of administrative data for identifying the presence and timing of admission to intensive care units in a Canadian Province. Med Care. (2012) 50:e1–6. doi: 10.1097/MLR.0b013e318245a754
30. Manitoba Population Report: June 1, 2016. Winnipeg, MB: Manitoba Health (2016). Available online at: https://www.gov.mb.ca/health/population/pr2016.pdf (accessed January 24, 2018).
31. Garland A, Fransoo R, Olafson K, Ramsey C, Yogendran M, Chateu D. The Epidemiology Outcomes of Critical Illness in Manitoba. (2011). Winnipeg, MB: Manitoba Centre for Health Policy. Available online at: http://mchp-appserv.cpe.umanitoba.ca/reference/MCHP_ICU_Report_WEB_%2820120403%29.pdf (accessed September 26, 2017).
32. Leading Causes of Death. Ottawa, ON: Statistics Canada (2020). Available online at: https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1310039401 (accessed January 23, 2020).
33. Karwinski B, Svendsen E, Hartveit F. Clinically undiagnosed malignant tumours found at autopsy. APMIS. (1990) 98:496–500. doi: 10.1111/j.1699-0463.1990.tb01062.x
34. Parajuli S, Aneja A, Mukherjee A. Undiagnosed fatal malignancy in adult autopsies: a 10-year retrospective study. Hum Pathol. (2016) 48:32–6. doi: 10.1016/j.humpath.2015.09.040
35. Lix L, Smith M, Pitz M, Ahmed R, Quon H, Griffith J. Cancer Data Linkage in Manitoba: Expanding the Infrastructure for Research. (2016). Available online at: http://mchp-appserv.cpe.umanitoba.ca/reference/Candata_web_final.pdf (accessed October 4, 2016).
36. Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi J, et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care. (2005) 43:1130–9. doi: 10.1097/01.mlr.0000182534.19832.83
37. Seymour CW, Kahn JM, Cooke CR, Watkins TR, Heckbert SR, Rea TD. Prediction of critical illness during out-of-hospital emergency care. JAMA. (2010) 304:747–54. doi: 10.1001/jama.2010.1140
38. Garland A, Olafson K, Ramsey C, Yogendran M, Fransoo R. Reassessing access to intensive care using an estimate of the population incidence of critical illness. Critical Care. (2018) 22:208. doi: 10.1186/s13054-018-2132-8
39. Garland A, Marrie R, Wunsch H, Yogendran M, Chateau D. Accuracy of administrative hospital data to identify use of life support modalities: a canadian study. Ann Am Thorac Soc. (2020) 17:229–35. doi: 10.1513/AnnalsATS.201902-106OC
40. Lemon SC, Roy J, Clark MA, Friedmann PD, Rakowski W. Classification and regression tree analysis in public health: methodological review and comparison with logistic regression. Ann Behav Med. (2003) 23:172–81. doi: 10.1207/S15324796ABM2603_02
41. Wielenga D. Identifying Overcoming Common Data Mining Mistakes. Cary, NC: SAS Institute (2007). Available online at: https://support.sas.com/resources/papers/proceedings/proceedings/forum2007/073-2007.pdf (accessed May 17, 2019).
42. SAS/STAT 15,.1 User's Guide: The HPSPLIT Procedure. Cary, NC: SAS Institute (2020). Available online at: https://documentation.sas.com/?docsetId=statuganddocsetTarget=statug_hpsplit_details30.htmanddocsetVersion=15.1andlocale=en (accessed August 22, 2020).
44. Statistical Area Classification (SAC). Ottawa, ON: Statistics Canada (2016). Available online at: http://www.statcan.gc.ca/pub/92-195-x/2011001/other-autre/sac-css/sac-css-eng.htm (accessed December 19, 2016).
45. Chateau D, Metge C, Prior H, Soodeen R. Learning from the census: the Socio-economic Factor Index (SEFI) and health outcomes in Manitoba. Can J Public Health. (2012) 103:S23–7. doi: 10.1007/BF03403825
46. The Johns Hopkins ACG® System: Technical Reference Guide Version 9.0. Baltimore, MD: Johns Hopkins University (2009).
47. Segal JB, Chang H-Y, Du Y, Walston JD, Carlson MC, Varadhan R. Development of a claims-based frailty indicator anchored to a well-established frailty phenotype. Med Care. (2017) 55:716–22. doi: 10.1097/MLR.0000000000000729
48. McIsaac DI, Wong CA, Huang A, Moloo H, van Walraven C. Derivation and validation of a generalizable preoperative frailty index using population-based health administrative data. Ann Surg. (2019) 270:102–8. doi: 10.1097/SLA.0000000000002769
49. Manitoba Health Links. Winnipeg, MB: Manitoba Health (2020). Available online at: https://centredesante.mb.ca/resources/in-your-community/health-links-info-sante/?lang=en (accessed April 4, 2020).
50. Nagin DS. Group-based Modeling of Development. London: Harvard University Press (2005). doi: 10.4159/9780674041318
51. Walkey AJ, Pencina KM, Knox D, Kuttler KG, D'Agostino RB, Benjamin EJ, et al. Five-year risk of mechanical ventilation in community-dwelling adults: the framingham-intermountain anticipating life support study. J Am Geriatr Soc. (2015) 63:2082–8. doi: 10.1111/jgs.13673
52. Austin PC, Walraven C. The mortality risk score and the ADG score: two points-based scoring systems for the Johns Hopkins aggregated diagnosis groups to predict mortality in a general adult population cohort in Ontario, Canada. Med Care. (2011) 49:940–7. doi: 10.1097/MLR.0b013e318229360e
53. Romero-Brufau S, Huddleston JM, Escobar GJ, Liebow M. Why the C-statistic is not informative to evaluate early warning scores and what metrics to use. Crit Care. (2015) 19:285. doi: 10.1186/s13054-015-0999-1
54. Maroco J, Silva D, Rodrigues A, Guerreiro M, Santana I, de Mendonca A. Data mining methods in the prediction of Dementia: a real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests. BMC Res Notes. (2011) 4:299. doi: 10.1186/1756-0500-4-299
55. Berg GD, Gurley VF. Development and validation of 15-month mortality prediction models: a retrospective observational comparison of machine-learning techniques in a national sample of Medicare recipients. BMJ Open. (2019) 9:e022935. doi: 10.1136/bmjopen-2018-022935
56. Desai RJ, Wang SV, Vaduganathan M, Evers T, Schneeweiss S. Comparison of machine learning methods with traditional models for use of administrative claims with electronic medical records to predict heart failure outcomes. JAMA Netw Open. (2020) 3:e1918962. doi: 10.1001/jamanetworkopen.2019.18962
57. Zhai Q, Lin Z, Ge H, Liang Y, Li N, Ma Q, et al. Using machine learning tools to predict outcomes for emergency department intensive care unit patients. Sci Rep. (2020) 10:20919. doi: 10.1038/s41598-020-77548-3
58. Song X, Liu X, Liu F, Wang C. Comparison of machine learning and logistic regression models in predicting acute kidney injury: a systematic review and meta-analysis. Int J Med Inform. (2021) 151:104484. doi: 10.1016/j.ijmedinf.2021.104484
59. Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. (2019) 110:12–22. doi: 10.1016/j.jclinepi.2019.02.004
60. Deaths By Age Group Sex. Statistics Canada (2021). Available online at: https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1310070901andpickMembers%5B0%5D=1.1andpickMembers%5B1%5D=3.1andcubeTimeFrame.startYear=2014andcubeTimeFrame.endYear=2014andreferencePeriods=20140101%2C20140101 (accessed November 30, 2021).
61. Gedeborg R, Chen LH, Thiblin I, Byberg L, Melhus H, Michaelsson K, et al. Prehospital injury deaths–strengthening the case for prevention: nationwide cohort study. J Trauma Acute Care Surg. (2012) 72:765–72. doi: 10.1097/TA.0b013e3182288272
63. Fried LP, Tangen CM, Walston J, Newman AB, Hirsch C, Gottdiener J, et al. Frailty in older adults: evidence for a phenotype. J Gerontol A Biol Sci Med Sci. (2001) 56:M146–156. doi: 10.1093/gerona/56.3.M146
64. Rockwood K, Song X, MacKnight C, Bergman H, Hogan DB, McDowell I, et al. A global clinical measure of fitness and frailty in elderly people. CMAJ. (2005) 173:489–95. doi: 10.1503/cmaj.050051
65. Inouye SK, Bogardus STJr, Vitagliano G, Desai MM, Williams CS, Grady JN, et al. Burden of illness score for elderly persons: risk adjustment incorporating the cumulative impact of diseases, physiologic abnormalities, and functional impairments. Med Care. (2003) 41:70–83. doi: 10.1097/00005650-200301000-00010
Keywords: critical illness, population health, administrative data, forecasting, cluster analysis, routinely collected health data
Citation: Garland A, Marrie RA, Wunsch H, Yogendran M and Chateau D (2022) Administrative Data Is Insufficient to Identify Near-Future Critical Illness: A Population-Based Retrospective Cohort Study. Front. Epidemiol. 2:944216. doi: 10.3389/fepid.2022.944216
Received: 14 May 2022; Accepted: 13 June 2022;
Published: 25 July 2022.
Edited by:
Ciro Martins Gomes, University of Brazilia, BrazilReviewed by:
Camilla Wiuff, Statens Serum Institut (SSI), DenmarkDaniel Holanda Barroso, University of Brazilia, Brazil
Henry Maia Peixoto, University of Brazilia, Brazil
Copyright © 2022 Garland, Marrie, Wunsch, Yogendran and Chateau. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Allan Garland, YWdhcmxhbmQmI3gwMDA0MDtoc2MubWIuY2E=