Utility of multimodal longitudinal imaging data for dynamic prediction of cardiovascular and renal disease: the CARDIA study

Nguyen, Hieu; Vasconcellos, Henrique D.; Keck, Kimberley; Carr, Jeffrey; Launer, Lenore J.; Guallar, Eliseo; Lima, João A. C.; Ambale-Venkatesh, Bharath

doi:10.3389/fradi.2024.1269023

ORIGINAL RESEARCH article

Front. Radiol., 27 February 2024

Sec. Cardiothoracic Imaging

Volume 4 - 2024 | https://doi.org/10.3389/fradi.2024.1269023

This article is part of the Research TopicArtificial Intelligence and Multimodal Medical Imaging Data Fusion for Improving Cardiovascular Disease CareView all 5 articles

Utility of multimodal longitudinal imaging data for dynamic prediction of cardiovascular and renal disease: the CARDIA study

Hieu Nguyen¹

Henrique D. Vasconcellos²

Kimberley Keck²

Jeffrey Carr³

Lenore J. Launer⁴

Eliseo Guallar⁵

João A. C. Lima²

Bharath Ambale-Venkatesh^6*

¹Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States
²Department of Cardiology, Johns Hopkins University, Baltimore, MD, United States
³Department of Radiology and Radiological Sciences, Vanderbilt University, Nashville, TN, United States
⁴Laboratory of Epidemiology and Population Science, National Institute on Aging, Bethesda, MD, United States
⁵Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, United States
⁶Department of Radiology, Johns Hopkins University, Baltimore, MD, United States

Background: Medical examinations contain repeatedly measured data from multiple visits, including imaging variables collected from different modalities. However, the utility of such data for the prediction of time-to-event is unknown, and only a fraction of the data is typically used for risk prediction. We hypothesized that multimodal longitudinal imaging data could improve dynamic disease prognosis of cardiovascular and renal disease (CVRD).

Methods: In a multi-centered cohort of 5,114 CARDIA participants, we included 166 longitudinal imaging variables from five imaging modalities: Echocardiography (Echo), Cardiac and Abdominal Computed Tomography (CT), Dual-Energy x-ray Absorptiometry (DEXA), Brain Magnetic Resonance Imaging (MRI) collected from young adulthood to mid-life over 30 years (1985–2016) to perform dynamic survival analysis of CVRD events using machine learning dynamic survival analysis (Dynamic-DeepHit, LTRCforest, and Extended Cox for Time-varying Covariates). Risk probabilities were continuously updated as new data were collected. Model performance was assessed using integrated AUC and C-index and compared to traditional risk factors.

Results: Longitudinal imaging data, even when being irregularly collected with high missing rates, improved CVRD dynamic prediction (0.03 in integrated AUC, up to 0.05 in C-index compared to traditional risk factors; best model's C-index = 0.80–0.83 up to 20 years from baseline) from young adulthood followed up to midlife. Among imaging variables, Echo and CT variables contributed significantly to improved risk estimation. Echo measured in early adulthood predicted midlife CVRD risks almost as well as Echo measured 10–15 years later (0.01 C-index difference). The most recent CT exam provided the most accurate prediction for short-term risk estimation. Brain MRI markers provided additional information from cardiac Echo and CT variables that led to a slightly improved prediction.

Conclusions: Longitudinal multimodal imaging data readily collected from follow-up exams can improve CVRD dynamic prediction. Echocardiography measured early can provide a good long-term risk estimation, while CT/calcium scoring variables carry atherosclerotic signatures that benefit more immediate risk assessment starting in middle-age.

Introduction

The rapidly expanding availability of large health data sets has fueled the growing research for more accurate risk prediction which holds much potential for preventive and monitoring strategies as well as improved disease understanding. In many scenarios, imaging data are collected over various modalities (multimodal) such as Echocardiography, Magnetic Resonance Imaging, and Computed Tomography, and repeatedly measured in multiple follow-up exams. Multimodal longitudinal imaging data could provide a more comprehensive description of the body and the development of organ functions and structures over time. In cardiology, numerous imaging markers for subclinical atherosclerosis have been demonstrated to be independently predictive of cardiovascular events (1–3) and cardiac dysfunction (4, 5). Many published works have focused on a few imaging variables that are low-dimensional, single-modal, (2, 4–6) or cross-sectional. (7) The utility of high-dimensional, multimodal, and longitudinal imaging data has not been investigated for risk prediction and early phenotyping of cardiovascular diseases.

Cox Proportional Hazards (Cox-PH) is among the most popular methods for survival analysis but Cox-PH is not suitable for high-dimensional data with repeated measures. The extended version of Cox-PH that can work with time-varying covariates is still limited because of the high number of variables, nonlinearity of variables, and requirement of data with no missingness. (8) Machine learning (ML) approaches such as Random Survival Forest (9) can mitigate some of Cox's limitations, but many ML methods are limited to static prediction and cannot perform dynamic survival analysis. In static prediction, the model does not automatically update as new observations are collected, (new data would require refitting an existing model or training a new model). Unlike static prediction, a dynamic survival analysis model automatically updates predicted survival probabilities as additional longitudinal observations are collected, and the model is trained only once. The ability to dynamically update risk as new information rolls in makes dynamic survival analysis attractive (10, 11).

In this work, we demonstrated the utility of dynamic prediction of Cardiovascular and Renal Disease (CVRD) using high-dimensional, multimodal, longitudinal imaging. Data were collected in CARDIA, which is a large epidemiological study of Black and White young adults followed up over 30 years. We also identified the most important imaging predictors for CVRD in the CARDIA cohort.

Methods

Study population and outcome

The design of the CARDIA study (Coronary Artery Risk Development in Young Adults) has been described elsewhere. (12) Briefly, CARDIA is a prospective, observational cohort study of 5,114 (originally 5,115, one person withdrew consent) White and Black men and women aged 18 to 30 years, at four centers in the United States. The cohort is approximately balanced regarding age, race, sex, and educational level. Participants have been followed since 1985, with regular exam visits scheduled every 2–5 years. Each exam has collected a wide variety of variables believed to be related to heart disease. The institutional review board of each participating institution approved the study protocol and all participants gave informed consent.

The outcome of this study is cardiovascular and renal disease (CVRD), and the first CVRD event was used as the endpoint. These events were adjudicated through August 2019. The primary composite outcome was incident cardiovascular disease and renal disease, which included coronary heart disease (CHD, myocardial infarction, acute coronary syndrome, or CHD death, including fatal myocardial infarction), stroke, transient ischemic attack, hospitalization for heart failure, intervention for peripheral arterial disease, end-stage renal disease, or death from cardiovascular or renal causes. Participants who died from a non-CVRD cause were censored at the time of death in the survival models.

Imaging markers

CARDIA follow-up exams collect various imaging variables from different sources, such as Echocardiography (Echo), Computed Tomography (CT), Carotid Ultrasonography (CARTD), Dual-Energy x-ray Absorptiometry (DEXA), and Brain Magnetic Resonance Imaging (MRI). The extracted imaging variables have a high degree of sparsity and irregularity, reflecting real-world data. Echo was performed as part of the core study in Y5, Y25, and Y30 and as a substudy in Y10; CT was conducted in Y10 as a substudy and in Y15, Y20, and Y25 as the core study, and brain MRI was acquired in Y25 and Y30 as a substudy. Figure 1 shows an overview of imaging markers used in this study and Supplementary Table S1 shows a detailed list of when the measures were collected. Data collection protocols for each imaging modality are available on the CARDIA study website. (13) We used the longitudinal imaging CARDIA data from all exam years to develop prediction models.

Figure 1

Figure 1. (A) Overview of the multimodal data used for prediction. (B) Methodology framework visualization for dynamic survival analysis. Echo, echocardiography; CT, computed tomography; CARTD, carotid artery ultrasonography, DEXA, dual-energy x-ray absorptiometry; MRI, magnetic resonance imaging.

Variables were pre-filtered with help from domain experts (clinicians who performed image reading daily). Other exclusion criteria include removing duplicated variables across modalities, variables available in too few subjects and variables with poor documentation. Overall, we included a total of 151 longitudinal imaging markers. In addition, we also included 15 traditional risk factors: nine variables from the AHA/ACC ASCVD risk scores and six additional risk factors (diastolic blood pressure—DBP, body-mass index—BMI, taking cholesterol-lowering medications, low-density lipoproteins—LDL, triglycerides, and fasting glucose). These six additional risk factors are routinely collected laboratory tests that have been shown to link to cardiovascular disease.

Statistical analysis

Model training and evaluation

Figure 1 shows the schematic of the statistical analysis procedures. All the models were trained and evaluated on the same cohort by 5-fold × 20 times cross-validation scheme. For each time the whole data was split, 20% of the data was used for testing, and the remaining 80% was further divided into training and validation sets. The training sets were used to fit the models, the validation sets were for hyperparameter tuning, and the testing sets were for assessing model performance. Stratified sampling by event was conducted to ensure the same ratio of events to non-events across the splits.

Modeling methods

We used three algorithms to model data for dynamic survival analysis. To issue dynamically updated survival predictions the data requires methods that can incorporate high-dimensional, longitudinal data comprising various repeated measurements with varying degrees of missingness. The main algorithm we used is Dynamic-DeepHit. (11) Dynamic-DeepHit is a deep learning-based approach that issues dynamically updated survival predictions without making any assumptions about the underlying processes. Briefly, Dynamic-DeepHit consists of two subnetworks: a shared subnetwork with a recurrent neural network architecture that handles longitudinal measurements and predicts the next measurements of time-varying covariates using the past available measurements and thus handles sparsity, and a second subnetwork with cause-specific survival networks of fully connected layers that relates the longitudinal data to the survival outcome. Dynamic-DeepHit trains by minimizing the loss function which comprises three losses: a survival loss of log-likelihood of joint time-to-event distribution, a ranking loss that adapts the idea of concordance that encourages correct ordering of participants based on their time-to-event, and a step-ahead prediction loss that encourages correct prediction of longitudinal covariates for the next time step. Detailed description of Dynamic-DeepHit can be found elsewhere (11).

We employed two additional methods for dynamic prediction, namely Left-Truncated-Right-Censored Forest (LTRCforest) (14) and Extended Cox for Time-dependent Covariates. (8) Briefly, the Extended Cox is an extension of the fully parametric Cox-PH that assumes the variable values remain constant from the last observed value until updated. To handle a large number of input covariates, LASSO penalization (15) was employed. LTRCforest is an extension of the non-parametric ML method conditional forest (Cforest) for time-varying covariates. Since LTRCforest and Extended Cox require fully available data, missing data were imputed before being input into these models using Multiple Imputation by Chained Equations (MICE) for multilevel data (16, 17).

As a benchmark, we also fit static survival models at three time points (5 years, 15 years, and 25 years after baseline) to compare with the dynamic survival models. (18) The idea is similar to landmarking approaches, in which a survival model is fit to the subjects who are still at risk at the landmarking time. For consistency in comparison, we used Dynamic-DeepHit and made an artificial cut-off at the landmarking time (meaning, covariate measurements after Y5, Y15, and Y25 were excluded from the static model at landmarking time 5 years, 15 years, and 25 years after baseline, respectively). Only measurements collected before the landmarking time were included.

Importance of imaging subsets and variables

To evaluate the effect of each imaging variable subset on CVRD prediction, we built five Dynamic-DeepHit models, each with traditional risk factors and imaging markers from a single modality. We also built a model with all imaging markers from all modalities and a reference model with only traditional risk factors. Additionally, to assess the complementary effects of multiple imaging subsets, we built 25 additional models representing all possible combinations of five imaging subsets. In total, 32 models were built (Supplementary Table S2).

To examine the effect of imaging variables collected at different ages, we also built separate models for each imaging subset that included variables collected from each exam and excluded measurements collected outside of the exam. Additionally, the importance of each imaging marker was quantified using permutation importance, similar to permutation testing, (19) for the model trained on all variables. The longitudinal trajectories were permuted among participants, and the drop in the C-index of permuted variables to the C-index of the original dataset was used as the ranking criteria for variable importance. A bigger drop in C-index indicated a more important variable, and a minimal drop suggested that the variable was not important, as changing the variable value did not change model performance. Variables with the same difference in C-index were assigned the same ranking.

Performance evaluation

Model performance was quantified using the time-dependent area under the receiver-operating curve (AUC) accounting for censorship (20) and the time-dependent concordance index that accounted for censoring distribution. (21) In addition, the integrated AUC (iAUC) was used to quantify all time-varying AUCs as one number. (22) Statistical significance was evaluated using Wilcoxon rank sum test.

Results

A total of 5,114 participants were included in the analysis. Table 1 describes the characteristics and number of remaining participants in the cohort over nine follow-up exams. The mean age was 25 years old in CARDIA Y0 Exam (baseline) and 55 years old in the last exam (Y30). The cohort consisted of 46% male, 52% black, and 48% white. Over 30 years of follow-up, 3,358 came back for Y30, the averaged SBP and DBP (systolic and diastolic blood pressure) increased and so was the use of hypertensive medication. BMI, total cholesterol, high-density lipoprotein (HDL), and use of cholesterol-lowering medication increased. The prevalence of diabetes also increased, while the number of smokers decreased. By the end of follow-up, 375 participants (7.3%) had developed CVRD. The cumulative incidence of CVRD is shown in Supplementary Figure S1, with very few events happening before Y10 Exam, while the event rate curve is almost exponential after Y20 Exam.

Table 1

Table 1. Characteristics of the study population over time.

Dynamic vs. static prediction

Figure 2 shows the performance over time for dynamic prediction vs. static prediction. The dynamic survival model using Dynamic-DeepHit trained on all 166 variables had a C-index of 0.80–0.82 before Y20 and slightly dropped to 0.78 by the last time point, 33 years after baseline. The C-index of the dynamic survival model is higher than that of the static survival models across all time points, by a margin of 0.01–0.06. Unlike the static survival models that required a separate model at each landmarking time, the dynamic model was only trained once and automatically updated survival probabilities as a new measurement updated from a follow-up exam.

Figure 2

Figure 2. Dynamic vs. static prediction.

Comparison of modeling methods

Supplementary Figure S2 shows the performance of dynamic models trained with different algorithms (Dynamic-DeepHit, LTRCforest, Extended Cox, and Extended Cox penalized by LASSO). For both cases (trained on all variables and trained on traditional risk factors), Dynamic-DeepHit trained on unimputed data and LTRCforest trained on imputed data are consistently the best, whereas Extended Cox consistently underperformed (∼0.05–0.10 lower in C-index) for the model with all variables and 3%–5% lower for model with only traditional risk factors).

Predictive gain of modalities

Figure 3 shows the predictive gain from each imaging subset over time and on average over 25 cross-validation folds. The best model was the one that included all imaging subsets, while the worst-performing was the one using only traditional risk factors (baseline) and using traditional risk factors plus CARTD variables (0.74 C-index at end of follow-up). Using all imaging markers resulted in up to a 5% increase in C-index and 3% in iAUC. The model utilizing only CT variables was only slightly below (1%) the model using all imaging variables, which helped elevate performance since Y10 Exam with a more apparent gain after Y20 Exam when more CT variables were collected. The model trained on Echo variables shows that the inclusion of Echo variables in addition to traditional risk factors helped increase prediction accuracy throughout the entire follow-up period by ∼1.5%–2% in C-index. DEXA variables improved performance very slightly up to Y25 by C-index (<0.01 absolute difference) and had negligible gain in terms of iAUC. Brain MRI variables, collected at Y25 and Y30 Exam, helped boost CRVD prediction performance by 0.01–0.02 C-index gain.

Figure 3

Figure 3. Predictive gain from imaging variables of different modalities. Top: performance over time. The colored texts at the top indicate which imaging modalities were collected at each exam. Bottom: integrated AUC (iAUC) Gain with respect to the baseline model trained on only traditional risk factors. All pairwise hypothesis tests are significant (including All vs. ECHO and All vs. CT), unless otherwise denoted with “ns” (“non-significant”).

Supplementary Table S2 shows the averaged iAUC gain from each of the 32 exhaustive combinations of 5 imaging variable subsets with respect to the baseline model. Aside from the best model using all imaging markers, the second-best subset is a combination of Echo, CT, and Brain MRI markers with an averaged iAUC gain of 0.027. A combination of Echo and CT variables resulted in a 0.022 iAUC gain, and the largest gain from a single imaging subset was from CT variables (0.014 iAUC gain).

Temporal importance

Figure 4 quantifies the importance of early vs. late measurements on CRVD prediction in two imaging subsets with the most influence on prediction: Echo and CT. For Echo, the model trained on only early Echo measurements (collected in Y5 Exam as a core study and partially in Y10 Exam as a substudy) had just as good overall performance (iAUC = 0.78) as the model trained on Echo variables collected later in life (Y20 and Y25 Core Exams) (iAUC = 0.78). For longer-term risk prediction (25 years after Y0), the C-index of the model trained on early Echo was only 1% less than that of the model trained on late Echo. Regarding CT variables, which were collected in Y10 as a substudy and in Y15, Y20, and Y25 as core studies, the most recent CT exam provided the most accurate prediction, as evidenced by the immediate bump in the C-index after each CT Exam. The most prominent bump is right after CT Y25 which resulted in a 3% increase in C-index compared to using CT variables from Y20 or earlier (p < 0.001).

Figure 4

Figure 4. Effects of early vs. late imaging measurement. (A) Early vs. Late Echo, (B) Early vs. Late CT. Early Echo provided good overall gain for long-term risk estimation, compared to Late Echo. For CT, the most recent CT exam provides the most accurate prediction, evident by immediate bumps after each CT Exam, especially CT Y25. P-values of significant pairwise hypothesis tests are shown.

Variable importance

In addition to quantifying the importance of variable subsets relative to each other, variable-level importance was quantified. Table 2 shows the top 20 ranked variables at three representative times: 15 years, 25 years, and 33 years (endpoint) after Y0. Total cholesterol and low-density lipoprotein cholesterol (LDL) were consistently the most important predictors of CVRD. Most of the top 20 variables were either collected by Echo and CT, attesting to their importance to CVRD prediction and consistent with the results in Figure 2. Chronological age is an important variable for year 15 (mean age = 40) but not among the top 20 as participants got older. The top Echo and CT variables have similar relative variable importance in the rankings.

Table 2

Table 2. Variable importance ranking (top 20).

Discussion

In this work, we investigated the utility of high-dimensional longitudinal imaging data of five modalities, separately and together, for dynamic prediction of CVRD in young adults in a multi-centered cohort followed up over 30 years. We used the entirety of imaging variables over all exam years for continuously updated predictions of risk. The results suggest that longitudinal imaging data, even when irregularly collected and having high missing rates, improved CVRD dynamic prediction (3% iAUC, up to 5% C-index in midlife). Among different subsets of imaging markers, Echo and CT contributed to most of the improved risk estimation. Brain MRI variables contributed additional information that slightly improved prediction when they were collected. DEXA and Carotid IMT contributed little to none to CVRD prediction, even though they could be helpful in other aspects of clinical and epidemiological research. In terms of the effects of imaging markers measured early or late in life, the results suggested that Echo measured in early adulthood could predict long-term CVRD risks almost as well as Echo measured 10–15 years later. For CT, the most recent CT exam provides the most accurate prediction for short-term CVRD risk estimation. The results also suggest that the prediction ability of models decreased over time, particularly so between the ages of 40 and 50 years, when only traditional risk factors were included in the models. The addition of imaging variables helped maintain the prediction ability beyond middle age.

Multimodal imaging markers for dynamic prediction

This work is unique as it is among the first that incorporates high-dimensional longitudinal imaging markers from multiple modalities collected with high levels of sparsity (a high percentage of missing values) and irregularity (non-uniform time intervals between measurements) for dynamic prediction of CVRD. Many previous studies have limited the use of imaging data in prediction models, using only cross-sectional data or a few variables, or only including complete data. Simple imputation methods are often used to deal with sparsity and irregularity (mean/median imputation or last observation carried forward), (23) but these can introduce bias and do not fully capture the information in longitudinal data. In this work, we employed Dynamic-DeepHit which was capable of dealing with data of high sparsity and irregularity, (11) and thus could overcome the aforementioned challenges and better capture the rich information to improve risk estimation.

We showed that the inclusion of longitudinal multimodal imaging markers led to 0.03–0.05 increase in C-index and iAUC compared to not using imaging markers. It is worth noting that the imaging data collected in CARDIA was highly sparse and irregular (only available in two or three follow-up exams, collected in a small subset of participants (Supplementary Table S1). The various missingness patterns reflected the nature of real-world data as not all information from a past patient visit will be collected in the current visit. In addition, the general population of the CARDIA study is in young adulthood to mid-life, and so most people did not have major findings in imaging examinations. This may be one of the reasons that the increase in the model performance by adding the new data (longitudinal data) was not higher than the observed 0.03–0.05 range. We argue that, despite the variable missingness rates and the generally healthy population, longitudinal multimodal imaging markers still improved prediction up to 5%. More complete and frequently collected data and a population with higher CVRD prevalence will likely yield greater improvement in risk estimation.

This study found that the predictive accuracy of a model for cardiovascular disease risk dropped after the mean age of the participants reached 45 years old, especially when using only traditional risk factors. The inclusion of multimodal imaging markers helped stabilize the predictive accuracy and prevented a decline of 6%–7% over 13 years. The decline in predictive performance when using only traditional risk factors may be due to several factors. First, traditional risk factors may be less effective for predicting 10-year cardiovascular risk in people entering midlife despite their demonstrated usage for longer-term prediction. Previous studies from our group have shown that traditional risk factors were not among the top predictors for short-term prediction in an older population (MESA cohort, mean age = 62), (7) and also in the CARDIA population. (24) Second, some non-traditional risk factors such as mental health, alcohol abuse, and other lifestyle factors, were not included in our prediction models. Studies have shown that cumulative effects of stress and alcohol contributed to worsening cardiovascular health. (25–27) Third, health tends to decline starting in middle-age, when many changes occur in the body, making it more challenging to predict cardiovascular disease risk at this age. For example, at this age range, menopause often begins and the aortic root could enlarge and dilate, which have been shown to negatively affect cardiovascular functions and metabolism. (28) More generally, metabolic syndrome in those 40–59 years of age were about three times as likely to happen as in those 20–39 years old (29).

In this regard, the decline in predictive performance in using only traditional factors further highlighted the role of multimodal imaging markers. Even though the traditional risk factors are fundamental to the genesis and progression of CVRD, multimodal imaging markers can pick up physiologic signals that are closer to disease initiation and closer to adverse outcomes. Furthermore, imaging markers can capture signals from some of the cumulative effects of insults to the body that were not captured by traditional risk factors. For example, coronary calcification from CAC/CT has demonstrated the proatherogenic effects of heavy alcohol consumption since young adulthood. (30) CAC/CT variables consistently ranked in the top predictors of outcome in our models (Table 2). Signals signifying the changes in the body at middle age could also be recognized by longitudinal imaging, for example, aortic root enlargement captured by Echo and was among the top−6 predictors of outcome at year 33, when the average participants’ age was 58. In addition, the importance of age decreased over time while the importance of the imaging markers increased (Table 2), suggesting that vascular age captured by imaging may be more relevant than chronologic age. Overall, the included multimodal longitudinal imaging markers stabilized the decrease in prediction accuracy but may not have captured all relevant information. Adding more diverse, high-quality multimodal data may be necessary to further improve prediction in this age group.

Importance of imaging subsets

In this work, we quantified the importance of imaging markers as whole variable subsets/imaging modalities in addition to looking at variable-level importance. We also assessed spatial importance (in one exam) and temporal importance (across exams). We found that CT and Echo variables were consistently among the most important predictors. Specifically, within Echo variables, left ventricular dimensions, ventricular septal thickness, aortic root measurements, and circumferential peak strain were among the most important. These variables are also reportedly among the top predictors in other large-scale studies. (31) For CT variables, markers from coronary artery calcium (CAC) scans were consistently among the top predictors, adding to the growing evidence in the literature about the importance of CAC. Abdominal aortic calcium variables such as the number and size of lesions of the abdominal aorta and common iliac aorta are also among the top 30–50 predictors. Additionally, intermuscular adipose tissue (IMAT) measured by CT also consistently presented in the top 30, agreeing with previous reports showing IMAT associated with increased subclinical atherosclerosis independent of traditional cardiovascular disease risk factors and other adipose depots (5).

In the early years, Echo markers, specifically markers of hypertension (such as septal thickness, LV volume and dimension), contributed the most to outcome prediction. However, in later years, CT markers played a larger role in prediction (Figures 3, 4; Table 2). This suggests that at a young age, hypertension is the main driver of CVD, whereas at middle age, markers of atherosclerosis become the main driver, which can be more efficiently captured by CT/calcium scoring. Variables from the other subsets (e.g., DEXA, CARTD) contributed weakly to the prediction. Regarding brain MRI markers, total brain volume, including gray matter, white matter and cerebral spinal fluid and abnormal tissue volumes, primarily in white matter were among the top 15–20, and overall brain markers helped improve CVRD prediction in the immediate years after they were added. Previous studies have reported that cardiovascular risk burden is associated with cognitive decline, structural brain differences, and brain age. (32–35) However, most studies show that CVD risk factors predict or are associated with brain structure and function, (33, 35, 36) and not the other direction. Therefore, the brain MRI measures may reflect already accumulated CVD risk factors and therefore provide extra information on the severity of the risk factors further improving CVRD prediction.

Algorithmic consideration

In our study, we compared several dynamic survival analysis algorithms to identify the best technique to handle sparse and irregular imaging data. Among the techniques tested, machine learning methods were superior to the Extended Cox model. The best-performing models were those using Dynamic-DeepHit trained on unimputed data, which can directly handle sparse, high dimensional, and irregular data and provide true dynamic prediction. LTRCforest trained on imputed data performed on par with Dynamic-DeepHit but was not a true dynamic prediction algorithm and required imputation and more computational time. Therefore, Dynamic-DeepHit may be the most suitable algorithm for dynamic prediction.

Limitations

Our study has several limitations. The data collection started in 1985 in a biracial population and followed through for 30 years describing a certain cohort experience. Caution must be exercised when generalizing to other races and to the current population, as there may have been shifts in population characteristics over time. Second, external validation is challenging because long-term follow-up studies of young adults with extensive phenotyping like CARDIA are sparse. Third, as noted, many imaging markers in CARDIA are highly sparse and irregularly collected, whereas quantification of longitudinal multimodal imaging utility would improve with complete data. Despite that, the inclusion of sparse and irregular multimodal imaging data still significantly improved prediction. Finally, the collection of repeated multi-modal imaging is practically possible mainly in well-resourced health facilities.

Conclusions

We show that longitudinal multimodal imaging data readily collected from follow-up exams in a population study can improve CVRD dynamic prediction. Echocardiography measured early can capture hypertension status and provide a good prediction for long-term risk estimation, while CT/calcium scoring variables carry atherosclerotic signatures that benefit more immediate risk assessment starting in middle-age.

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: https://www.cardia.dopm.uab.edu/. CARDIA study data are available to affiliated and non-affiliated investigators. See the study website for further details: http://www.cardia.dopm.uab.edu/invitation-to-new-investigators.

Ethics statement

The studies involving humans were approved by Institutional Review Board for the overall CARDIA study at all sites (Northwestern University, University of Alabama Birmingham, University of Minnesota, and Kaiser Foundation Research Institute). Written informed consent was obtained from all subjects and/or their legal guardian(s). All methods were performed in accordance with the relevant guidelines and regulations. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

HN: Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Visualization, Writing – original draft, Writing – review & editing. HV: Data curation, Formal Analysis, Writing – review & editing. KK: Data curation, Formal Analysis, Writing – review & editing. JC: Funding acquisition, Writing – review & editing. LL: Data curation, Funding acquisition, Resources, Writing – review & editing. EG: Writing – review & editing. JL: Conceptualization, Funding acquisition, Resources, Writing – review & editing. BA-V: Conceptualization, Data curation, Methodology, Validation, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article.

The Coronary Artery Risk Development in Young Adults Study (CARDIA) is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with the University of Alabama at Birmingham (HHSN268201800005I & HHSN268201800007I), Northwestern University (HHSN268201800003I), University of Minnesota (HHSN268201800006I), and Kaiser Foundation Research Institute (HHSN268201800004I). Y25 CT Exam was funded by NHLBI grant R01-HL098445 to Vanderbilt University and Wake Forest University. This manuscript has been reviewed by CARDIA for scientific content.

Acknowledgments

We thank the CARDIA committee for reviewing the scientific content of this manuscript. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the U.S. Department of Health and Human Services.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fradi.2024.1269023/full#supplementary-material

References

1. Peters SAE, den Ruijter HM, Bots ML, Moons KGM. Improvements in risk stratification for the occurrence of cardiovascular disease by imaging subclinical atherosclerosis: a systematic review. Heart. (2012) 98:177–84. doi: 10.1136/heartjnl-2011-300747

PubMed Abstract | Crossref Full Text | Google Scholar

2. Nwabuo CC, Moreira HT, Vasconcellos HD, Mewton N, Opdahl A, Ogunyankin KO, et al. Left ventricular global function index predicts incident heart failure and cardiovascular disease in young adults: the coronary artery risk development in young adults (CARDIA) study. Eur Heart J Cardiovasc Imaging. (2019) 20:533–40. doi: 10.1093/ehjci/jey123

PubMed Abstract | Crossref Full Text | Google Scholar

3. Armstrong AC, Jacobs DR, Gidding SS, Colangelo LA, Gjesdal O, Lewis CE, et al. Framingham score and LV mass predict events in young adults: CARDIA study. Int J Cardiol. (2014) 172(2):350–5. doi: 10.1016/j.ijcard.2014.01.003

PubMed Abstract | Crossref Full Text | Google Scholar

4. Yared GS, Moreira HT, Ambale-Venkatesh B, Vasconcellos HD, Nwabuo CC, Ostovaneh MR, et al. Coronary artery calcium from early adulthood to middle age and left ventricular structure and function. Circ Cardiovasc Imaging. (2019) 12:e009228. doi: 10.1161/CIRCIMAGING.119.009228

PubMed Abstract | Crossref Full Text | Google Scholar

5. Terry JG, Shay CM, Schreiner PJ, Jacobs Jr DR, Sanchez OA, Reis JP, et al. Intermuscular adipose tissue and subclinical coronary artery calcification in midlife: the CARDIA study (coronary artery risk development in young adults). Arterioscler Thromb Vasc Biol. (2017) 37:2370–8. doi: 10.1161/ATVBAHA.117.309633

PubMed Abstract | Crossref Full Text | Google Scholar

6. Ciuffo L, Nguyen H, Marques MD, Aronis KN, Sivasambu B, de Vasconcelos HD, et al. Periatrial fat quality predicts atrial fibrillation ablation outcome. Circ Cardiovasc Imaging. (2019) 12:e008764. doi: 10.1161/CIRCIMAGING.118.008764

PubMed Abstract | Crossref Full Text | Google Scholar

7. Ambale-Venkatesh B, Yang X, Wu CO, Liu K, Hundley WG, McClelland R, et al. Cardiovascular event prediction by machine learning: the multi-ethnic study of atherosclerosis. Circ Res. (2017) 121:1092–101. doi: 10.1161/CIRCRESAHA.117.311312

PubMed Abstract | Crossref Full Text | Google Scholar

8. Fisher LD, Lin DY. Time-dependent covariates in the cox proportional-hazards regression model. Annu Rev Public Health. (1999) 20:145–57. doi: 10.1146/annurev.publhealth.20.1.145

PubMed Abstract | Crossref Full Text | Google Scholar

9. Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. Annals of Applied Statistics. (2008) 2:841–60. doi: 10.1214/08-AOAS169

Crossref Full Text | Google Scholar

10. Rizopoulos D, Molenberghs G, Lesaffre EMEH. Dynamic predictions with time-dependent covariates in survival analysis using joint modeling and landmarking. Biometrical Journal. (2017) 59:1261–76. doi: 10.1002/bimj.201600238

PubMed Abstract | Crossref Full Text | Google Scholar

11. Lee C, Yoon J, van der Schaar M. Dynamic-DeepHit: a deep learning approach for dynamic survival analysis with competing risks based on longitudinal data. IEEE Trans Biomed Eng. (2020) 67(1):122–33. doi: 10.1109/TBME.2019.2909027

PubMed Abstract | Crossref Full Text | Google Scholar

12. Friedman GD, Cutter GR, Donahue RP, Hughes GH, Hulley SB, Jacobs DR, et al. Cardia: study design, recruitment, and some characteristics of the examined subjects. J Clin Epidemiol. (1988) 41:1105–16. doi: 10.1016/0895-4356(88)90080-7

PubMed Abstract | Crossref Full Text | Google Scholar

13. The Coronary Artery Risk Development in Young Adults (CARDIA) Study Homepage. Available online at: https://www.cardia.dopm.uab.edu/

14. Fu W, Simonoff JS. Survival trees for left-truncated and right-censored data, with application to time-varying covariate data. Biostatistics. (2017) 18:352–69. doi: 10.1093/biostatistics/kxw047

PubMed Abstract | Crossref Full Text | Google Scholar

15. Tibshirani R. The lasso method for variable selection in the cox model. Stat Med. (1997) 16:385–95. doi: 10.1002/(SICI)1097-0258(1s9970228)16:4%3C385::AID-SIM380%3E3.0.CO;2-3

PubMed Abstract | Crossref Full Text | Google Scholar

16. Resche-Rigon M, White IR. Multiple imputation by chained equations for systematically and sporadically missing multilevel data. Stat Methods Med Res. (2018) 27:1634–49. doi: 10.1177/0962280216666564

PubMed Abstract | Crossref Full Text | Google Scholar

17. van Buuren S, Groothuis-Oudshoorn K. Mice: multivariate imputation by chained equations in R. J Stat Softw. (2011) 45:1–67. doi: 10.1101/2022.09.22.22280254

Crossref Full Text | Google Scholar

18. Nguyen HT, Vasconcellos HD, Keck K, Reis JP, Lewis CE, Sidney S, et al. Multivariate longitudinal data for survival analysis of cardiovascular event prediction in young adults: insights from a comparative explainable study. BMC Med Res Methodol. (2023) 23:23. doi: 10.1186/s12874-023-01845-4

PubMed Abstract | Crossref Full Text | Google Scholar

19. Ishwaran H. Variable importance in binary regression trees and forests. Electron J Stat. (2007) 1:519–37. doi: 10.1214/07-EJS039

Crossref Full Text | Google Scholar

20. Liang CJ, Heagerty PJ. A risk-based measure of time-varying prognostic discrimination for survival models. Biometrics. (2017) 73:725–34. doi: 10.1111/biom.12628

PubMed Abstract | Crossref Full Text | Google Scholar

21. Gerds TA, Kattan MW, Schumacher M, Yu C. Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring. Stat Med. (2013) 32:2173–84. doi: 10.1002/sim.5681

PubMed Abstract | Crossref Full Text | Google Scholar

22. Heagerty PJ, Zheng Y. Survival model predictive accuracy and ROC curves. Biometrics. (2005) 61:92–105. doi: 10.1111/j.0006-341X.2005.030814.x

PubMed Abstract | Crossref Full Text | Google Scholar

23. Zhao J, Feng QP, Wu P, Lupu RA, Wilke RA, Wells QS, et al. Learning from longitudinal data in electronic health record and genetic data to improve cardiovascular event prediction. Sci Rep. (2019) 9:1–10. doi: 10.1038/s41598-018-36745-x

PubMed Abstract | Crossref Full Text | Google Scholar

24. Nguyen HT, Venkatesh BA, Reis JP, Wu CO, Carr J, Nwabuo C, et al. Lifetime vs 10-year cardiovascular disease prediction in young adults using statistical machine learning and deep learning: the CARDIA study. medRxiv. (2022). doi: 10.1101/2022.09.22.22280254

Crossref Full Text | Google Scholar

25. Chaddha A, Robinson EA, Kline-Rogers E, Alexandris-Souphis T, Rubenfire M. Mental health and cardiovascular disease. Am J Med. (2016) 129:1145–8. doi: 10.1016/j.amjmed.2016.05.018

PubMed Abstract | Crossref Full Text | Google Scholar

26. Albert MA, Durazo EM, Slopen N, Zaslavsky AM, Buring JE, Silva T, et al. Cumulative psychological stress and cardiovascular disease risk in middle aged and older women: rationale, design, and baseline characteristics. Am Heart J. (2017) 192:1–12. doi: 10.1016/j.ahj.2017.06.012

PubMed Abstract | Crossref Full Text | Google Scholar

27. Kiechl S, Willeit J, Rungger G, Egger G, Oberhollenzer F, Bonora E. Alcohol consumption and atherosclerosis: what is the relation? Prospective results from the bruneck study. Stroke. (1998) 29:900–7. doi: 10.1161/01.STR.29.5.900

PubMed Abstract | Crossref Full Text | Google Scholar

28. Rosano GMC, Vitale C, Marazzi G, Volterrani M. Menopause and cardiovascular disease: the evidence. Climacteric. (2007) 10:19–24. doi: 10.1080/13697130601114917

PubMed Abstract | Crossref Full Text | Google Scholar

29. Ervin RB. Prevalence of metabolic syndrome among adults 20 years of age and over, by sex, age, race and ethnicity, and body mass index: United States, 2003–2006. National Health Statistics Reports. (2009). 13. https://stacks.cdc.gov/view/cdc/5448

Google Scholar

30. Pletcher MJ, Varosy P, Kiefe CI, Lewis CE, Sidney S, Hulley SB. Alcohol consumption, binge drinking, and early coronary calcification: findings from the coronary artery risk development in young adults (CARDIA) study. Am J Epidemiol. (2005) 161:423–33. doi: 10.1093/aje/kwi062

PubMed Abstract | Crossref Full Text | Google Scholar

31. Li Z, Yang Y, Zheng L, Sun G, Guo X, Sun Y. It’s time to add electrocardiography and echocardiography to CVD risk prediction models: results from a prospective cohort study. Risk Manag Healthc Policy. (2021) 14:4657. doi: 10.2147/RMHP.S337466

PubMed Abstract | Crossref Full Text | Google Scholar

32. Song R, Xu H, Dintica CS, Pan K-Y, Qi X, Buchman AS, et al. Associations between cardiovascular risk, structural brain changes, and cognitive decline. J Am Coll Cardiol. (2020) 75:2525–34. doi: 10.1016/j.jacc.2020.03.053

PubMed Abstract | Crossref Full Text | Google Scholar

33. Srinivasa RN, Rossetti HC, Gupta MK, Rosenberg RN, Weiner MF, Peshock RM, et al. Cardiovascular risk factors associated with smaller brain volumes in regions identified as early predictors of cognitive decline. Radiology. (2016) 278:198. doi: 10.1148/radiol.2015142488

PubMed Abstract | Crossref Full Text | Google Scholar

34. Kharabian Masouleh S, Beyer F, Lampe L, Loeffler M, Luck T, Riedel-Heller SG, et al. Gray matter structural networks are associated with cardiovascular risk factors in healthy older adults. J Cereb Blood Flow Metab. (2018) 38:360–72. doi: 10.1177/0271678X17729111

PubMed Abstract | Crossref Full Text | Google Scholar

35. Pase MP, Davis-Plourde K, Himali JJ, Satizabal CL, Aparicio H, Seshadri S, et al. Vascular risk at younger ages most strongly associates with current and future brain volume. Neurology. (2018) 91:e1479–86. doi: 10.1212/WNL.0000000000006360

PubMed Abstract | Crossref Full Text | Google Scholar

36. Armstrong AC, Muller M, Ambale-Ventakesh B, Halstead M, Kishi S, Bryan N, et al. Association of early left ventricular dysfunction with advanced magnetic resonance white matter and gray matter brain measures: the CARDIA study. Echocardiography. (2017) 34:1617–22. doi: 10.1111/echo.13695

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: multimodal, imaging, dynamic survival analysis, machine learning, dynamic prediction, cardiovascular disease, prognosis, CARDIA

Citation: Nguyen H, Vasconcellos HD, Keck K, Carr J, Launer LJ, Guallar E, Lima JAC and Ambale-Venkatesh B (2024) Utility of multimodal longitudinal imaging data for dynamic prediction of cardiovascular and renal disease: the CARDIA study. Front. Radiol. 4:1269023. doi: 10.3389/fradi.2024.1269023

Received: 28 July 2023; Accepted: 6 February 2024;
Published: 27 February 2024.

Edited by:

Douglas Sawyer, Maine Medical Center, Maine Health, United States

Reviewed by:

Salah Alheejawi, Northeastern University, United States
Ilies Ghanzouri, Stanford University, United States

© 2024 Nguyen, Vasconcellos, Keck, Carr, Launer, Guallar, Lima, Ambale-Venkatesh. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Bharath Ambale-Venkatesh YmFtYmFsZTFAamhtaS5lZHU=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Utility of multimodal longitudinal imaging data for dynamic prediction of cardiovascular and renal disease: the CARDIA study

Introduction

Methods

Study population and outcome

Imaging markers

Statistical analysis

Model training and evaluation

Modeling methods

Importance of imaging subsets and variables

Performance evaluation

Results

Dynamic vs. static prediction

Comparison of modeling methods

Predictive gain of modalities

Temporal importance

Variable importance

Discussion

Multimodal imaging markers for dynamic prediction

Importance of imaging subsets

Algorithmic consideration

Limitations

Conclusions

Data availability statement

Ethics statement

Author contributions

Funding

Acknowledgments

Conflict of interest

Publisher's note

Supplementary material

References

94% of researchers rate our articles as excellent or good

94% of researchers rate our articles as excellent or good