- Department of Clinical Physiology and Nuclear Medicine, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark
Purpose: Conventional magnetic resonance imaging (MRI) can for glioma assessment be supplemented by positron emission tomography (PET) imaging with radiolabeled amino acids such as O-(2-[18F]fluoroethyl)-L-tyrosine ([18F]FET), which provides additional information on metabolic properties. In neuro-oncology, patients often undergo brain and skull altering treatment, which is known to challenge MRI-based attenuation correction (MR-AC) methods and thereby impact the simplified semi-quantitative measures such as tumor-to-brain ratio (TBR) used in clinical routine. The aim of the present study was to examine the applicability of our deep learning method, DeepDixon, for MR-AC in [18F]FET PET/MRI scans of a post-surgery glioma cohort with metal implants.
Methods: The MR-AC maps were assessed for all 194 included post-surgery glioma patients (318 studies). The subgroup of 147 patients (222 studies, 200 MBq [18F]FET PET/MRI) with tracer uptake above 1 ml were subsequently reconstructed with DeepDixon, vendor-default atlas-based method, and a low-dose computed tomography (CT) used as reference. The biological tumor volume (BTV) was delineated on each patient by isocontouring tracer uptake above a TBR threshold of 1.6. We evaluated the MR-AC methods using the recommended clinical metrics BTV and mean and maximum TBR on a patient-by-patient basis against the reference with CT-AC.
Results: Ninety-seven percent of the studies (310/318) did not have any major artifacts using DeepDixon, which resulted in a Dice coefficient of 0.89/0.83 for tissue/bone, respectively, compared to 0.84/0.57 when using atlas. The average difference between DeepDixon and CT-AC was within 0.2% across all clinical metrics, and no statistically significant difference was found. When using DeepDixon, only 3 out of 222 studies (1%) exceeded our acceptance criteria compared to 72 of the 222 studies (32%) with the atlas method.
Conclusion: We evaluated the performance of a state-of-the-art MR-AC method on the largest post-surgical glioma patient cohort to date. We found that DeepDixon could overcome most of the issues arising from irregular anatomy and metal artifacts present in the cohort resulting in clinical metrics within acceptable limits of the reference CT-AC in almost all cases. This is a significant improvement over the vendor-provided atlas method and of particular importance in response assessment.
1. Introduction
Conventional magnetic resonance imaging (MRI) is the imaging modality of choice in neuro-oncology, both in clinical practice and in clinical trials (Holzgreve et al., 2021), and is used primarily for qualitative subjective interpretation with simplified measures such as size, number of lesions, and contrast enhancement patterns (Mullins et al., 2005; Galldiks et al., 2015a; Smits, 2021). When MRI is challenged in glioma assessment it may be supplemented by positron emission tomography (PET) imaging with radiolabeled amino acids, such as O-(2-[18F]fluoroethyl)-L-tyrosine ([18F]FET), that provide additional information on metabolic properties by visualizing the L-amino acid transporter (LAT) expression. The recommended clinical use of amino acid PET according to recent guidelines includes differential diagnosis in the primary evaluation of brain lesions, tumor grading, biopsy optimization and the differentiation of tumor relapse from treatment-related changes (Albert et al., 2016; Law et al., 2019). Pathological amino acid accumulation is estimated using simplified semi-quantitative measures such as tumor-to-brain ratio (TBR) on static [18F]FET-PET images, and by evaluating the tracer time-activity curve (TAC) extracted from dynamic images for 40–50 min following tracer administration (Suchorska et al., 2016; Law et al., 2019). Other indications such as response assessment and treatment planning of radiation and surgical intervention rely on the ability of amino acid PET to identify the extent of infiltrating glioma expressed as the biological tumor volume (BTV) (Law et al., 2019). The BTV is prognostic for overall survival in post-resection glioblastoma multiforme in multivariate analysis (Poulsen et al., 2017).
A prerequisite for diagnostic and prognostic accuracy and response assessment is quantitatively correct [18F]FET-PET images, which among others requires accurate attenuation correction (AC) (Vander Borght et al., 2006). In simultaneous PET/MRI, AC is usually performed using an MRI-based method, which poses a challenge since MRI is not related to electron densities contrary to e.g., computed tomography (CT) scans used for AC of PET/CT examinations. Several MR-AC methods exist, however, only a few of these methods are made available by the vendors in clinical brain PET/MRI (Ladefoged et al., 2016; Catana, 2020). For both the Biograph mMR scanner (Siemens Healthineers, Erlangen, Germany) and the Signa PET/MRI scanner (GE Healthcare, Chicago, IL, USA) there are atlas-based methods available, where a probabilistic atlas created from a database of CT images is aligned to MRI (Wollenweber et al., 2013; Paulus et al., 2015; Koesters et al., 2016), and additionally the option of a segmentation-based method, where dedicated short echo time sequences are used to separate voxels corresponding to air, tissue and bone (Keereman et al., 2010; Wiesinger et al., 2016).
While the performance of these methods has been evaluated extensively (Dickson et al., 2014; Ladefoged et al., 2016; Sekine et al., 2016a,b, 2020), the majority of these studies are based on dementia cohorts consisting of patients with normal skull anatomy. In neuro-oncology, patients often undergo brain and skull altering treatment, which might challenge atlas-based methods due to a lack of representative cases in the atlas database. While segmentation-based approaches might be more suited for adapting to abnormal anatomy, these methods are in turn susceptible to metal implant-induced artifacts, which depending on the sequence, can lead to distortions and/or partial or complete signal loss in the MRIs (Hargreaves et al., 2011).
We have previously demonstrated that RESOLUTE (Ladefoged et al., 2015), a segmentation-based method, can reproduce the clinical metrics of CT-AC in neuro-oncology studies, albeit with some outliers caused by local artifacts around titanium alloy mesh implants (Ladefoged et al., 2017). The advances in deep learning have led to numerous methods utilizing convolutional neural networks (CNNs) for MRI-to-CT conversion (Teuho et al., 2020; Torrado-Carvajal, 2020). We proposed such a method for MR-AC of pediatric neuro-oncology patients, which was found to improve the performance over RESOLUTE even in cases with irregular anatomy such as post-operative subcutaneous soft tissue swelling (Ladefoged et al., 2019). Finally, we proposed DeepDixon, a CNN identical to our pediatric version but re-trained and evaluated using a non-surgical cohort consisting of 1,037 adult subjects, resulting in average PET bias below 1% in any region of the brain (Ladefoged et al., 2020).
The aim of the present study was to examine the applicability of DeepDixon for MR-AC in [18F]FET PET/MRI scans of a post-surgery glioma cohort with metal implants, which represents the group of patients with the most challenging abnormal anatomy and metal-induced artifacts in clinical brain PET imaging. DeepDixon and the vendor-standard atlas-based method were compared to CT-AC and evaluated using the recommended clinical metrics on a patient-by-patient basis.
2. Materials and methods
2.1. Patients
The department imaging archive was screened for patients above 18 years with prior surgery for histologically proven glioma classified according to recent guidelines (Louis et al., 2016, 2021) and a simultaneous [18F]FET-PET and MRI scan performed using our hybrid PET/MRI system between October 2018 and January 2021. A total of 194 patients (mean age 54 years; range: 18–80 years) with in all 318 PET/MRI scans met the inclusion criteria (Table 1).
Table 1. Patient characteristics for all studies as well as for a subgroup of patients with BTV above 1 ml which were included in the PET evaluation.
2.2. Acquisition of CT
A reference low-dose CT image (120 kVp, 30 mAs, 74 slices, 0.6 mm3×0.6 mm3×3 mm3 voxels) of the head using a whole-body PET/CT system was used (Biograph TruePoint 40 and Biograph TruePoint 64, Siemens Healthineers, Knoxville, USA). The CT images were acquired on the same day as the PET/MRI examination (n = 123), or at a previous examination (n = 195) at a median of 112 days before (range 14–1,120 days) with no brain or skull altering surgery in-between.
2.3. Acquisition of MRI
The scan protocol included a T1-weighted (T1w) MPRAGE and a Dixon-VIBE sequence (the vendor default for MR-AC) with repetition time (TR) 4.14 ms, echo time 1 (TE1) 1.28 ms, echo time 2 (TE2) 2.51 ms, flip angle 10 degrees, coronal orientation, 39 s acquisition time, voxel size of 1.3 mm3×1.3 mm3×2.0 mm3. The software version was VE11P for all subjects.
2.4. Acquisition of [18F]FET-PET
Patients were positioned head first with their arms down on the fully integrated PET/MRI system. Data were acquired in list mode over 20–40 min (or 0–40 min for a subset of patients) after injection of 200 MBq [18F]FET over a single bed position of 25.8 cm covering the head and neck. For the purpose of this study, the PET data from the PET/MRI acquisition were reconstructed offline (E7tools, Siemens Healthineers, Knoxville, USA) using 3D ordinary poisson-ordered subset expectation maximization (OP-OSEM) with 4 iterations, 21 subsets, zoom 2.5, and a 5 mm Gaussian post-filtering on 344 × 344 matrices (0.8 mm3 × 0.8 mm3 × 2.0 mm3 voxels) in line with the clinical protocol used at our institution. We reconstructed the summed 20-min PET image for all patients (over 20–40 min for the patients imaged over 0–40 min), and in addition for the subset with 0–40 min dynamic imaging, we also reconstructed a dynamic series split into 14 frames (5 min × 1 min, 5 min × 3 min, 4 min × 5 min) similar to Galldiks et al. (2015b). For all images default random, scatter, and dead time correction were applied.
2.5. Attenuation correction methods
Three methods for AC were applied to the data: (1) the CT image rigidly aligned to the T1w MPRAGE were used as gold standard AC reference (Andersen et al., 2014), (2) the vendor-provided atlas-based attenuation map that incorporates spatially variant attenuation coefficients of the major bone structure into the Dixon-VIBE MR-AC (Paulus et al., 2015; Koesters et al., 2016), and (3) our deep learning-based method DeepDixon.
DeepDixon is a method to synthesize a brain CT image from only the fat- and water-weighted images of the Dixon-VIBE sequence. The method was developed and validated using more than 1,000 adult subjects primarily referred with suspicion of dementia, where a clinical evaluation using an independent test set showed no relevant differences compared to CT-AC (Ladefoged et al., 2020). The subjects used to develop DeepDixon all had normal anatomy, and thus, the method was not specifically trained to overcome the challenges related to imaging patients with brain surgery, e.g., bone modifying cranio-facial surgical interventions, cranial defects, dysplasias, disfigurements or metal implants besides dental implants. None of the patients included in this present study were part of the dataset used to develop or evaluate DeepDixon.
We did not re-train DeepDixon for the purpose of this study, but directly applied the model to generate synthetic CT images for the included patients. We hypothesize that the variation in the original training dataset is enough to overcome the post-surgery defects present in our current cohort.
We refer to the original publication for technical details regarding DeepDixon (Ladefoged et al., 2020), but in short the method is a 3D CNN build on the U-Net architecture and trained end-to-end with paired MRI and CT patches consisting of 16 consecutive transaxial slices. The method was implemented in TensorFlow and trained using the Adam optimizer. DeepDixon inference script and trained model weights has been made freely available1.
Any region not covered by the CT field-of-view was superimposed by the corresponding area in the atlas-based attenuation map. To ensure a fair comparison, the region was also superimposed on the DeepDixon image.
2.6. Image processing and analysis
2.6.1. Delineation of [18F]FET-active tumor
The [18F]FET-PET images were first normalized to a background region defined in healthy appearing gray and white matter at a level above the insula, automatically extracted by an in-house developed tool designed to match the manual workflow (Ladefoged et al., 2017), see Supplementary material 1. The background region was used to extract individual [18F]FET-PET mean values for each of the three AC methods. We further performed skull-stripping on the T1w images using HD-BET (Isensee et al., 2019), and applied the mask to all PET images subsequently to exclude any extracerebral uptake. The BTV of [18F]FET-PET was measured using a 3D iso-contour in Mirada XD (Mirada Medical, Oxford, UK) defining tumor tissue at a unique threshold above 1.6 of the mean standardized uptake value (SUV) in the background ROI (Floeth et al., 2005) for each AC method separately. Physiological extratumoral areas with high [18F]FET uptake, e.g., vascular structures and pineal body, were identified on either the T1w or PET image and removed from evaluation. Only scans with BTV above 1 ml measured with CT-AC were included in the evaluation of PET accuracy. A total of 222 scans from 147 patients were above the threshold (Table 1).
2.6.2. MR-AC map evaluation metrics
The DeepDixon MR-AC maps (n = 318 scans) were manually inspected side-by-side with CT-AC for prevalence of artifacts categorized in four categories: (1) no apparent artifacts, (2) minor not significant artifacts (e.g., small overestimation of titanium clamp), (3) intermediate potentially impactful artifacts (e.g., small metal-induced signal voids), and (4) major artifacts (e.g., large signal voids).
Quantitative performance of DeepDixon and atlas-based AC versus CT-AC were measured by calculating the Dice coefficient for the entire head, limited to the CT field-of-view, as well as areas adjacent to the BTV, defined as a sphere with a maximum distance of 5 cm to the center-of-max of any individual BTV (n = 222 scans). For both areas, we calculated the Dice coefficient for both the tissue (0.05–0.1 cm–1) and bone (> 0.1 cm–1) voxels. Mean absolute error (MAE) and structural similarity index measure (SSIM) was calculated for the entire head.
2.6.3. PET evaluation metrics
The accuracy of the different [18F]FET-PET AC methods was assessed on a patient-by-patient basis (n = 222 scans) using the guideline recommended semi-quantitative clinical metrics identical to previous studies (Ladefoged et al., 2017, 2019): Mean and maximum SUV and tumor-to-background ratio (TBR) were measured within each BTV, as well as the size of the BTV. These metrics are commonly used as a criterion to identify active tumor tissue from reactive changes (Pöpperl et al., 2006; Langen et al., 2011; Galldiks et al., 2015a,b; Ceccon et al., 2017), and evaluated alongside other factors such as activity and MRI morphology, previous and current treatment, structural changes, and prior imaging results. We adopted the acceptance criteria used in our previous study (Ladefoged et al., 2017): absolute differences of < ± 0.05 and ± 0.1 or relative difference of 5% for mean and maximum TBR, respectively, and ± 2 ml or 10% for the BTV. The mix of both an absolute and relative cut-off reflects that larger absolute difference is acceptable in large or very active tumors.
The size and shape of the BTV has significance for both radiotherapy and surgical planning and the assessment of treatment response (Albert et al., 2016; Moller et al., 2016). Tumor contours relative to the CT-AC reference were analyzed using Dice coefficient and Hausdorff distance metrics, and with a measurement of shape deviations (Supplementary Figure 1), found by thresholding the smoothed tumor difference image:
where G is a Gaussian filter with 4 mm FWHM and X is atlas or DeepDixon. This is in recognition that the clinical impact of a volume change caused by a focal structure is larger than volume change caused by a one-voxel displacement along the tumor contour.
The location of the TBRmax, which is usually used for biopsy target planning as it identifies the biologically most aggressive component (“hot spots”) in heterogeneous glioma (Messing-Junger et al., 2002; Floeth et al., 2005; Ewelt et al., 2011; Kunz et al., 2011), were compared for each method. Our criterion was set at < 10 mm from the location at our reference PET with CT-AC, as this corresponds to the approximate size of an average stereotactic biopsy.
Finally, global similarity metrics [MAE, SSIM, and peak signal-to-noise ratio (PSNR)] was computed for each MR-AC method against the reference PET with CT-AC within the brain. Relative percent difference was calculated for the whole brain, the BTV, and regionally defined using anatomical pre-defined template regions in Montreal Neurological Institute and Hospital (MNI) (Fonov et al., 2009).
2.6.4. Longitudinal robustness
The robustness of the MR-AC methods over time with importance for response assessment was addressed by calculating the absolute and relative change of TBRmean, TBRmax, and BTV between baseline and follow-up examinations, respectively, and compared to the reference change with CT-AC. A total of 56 patients had at least one follow-up 14 days to 17 months (average: 113 days) following the first. If more than one follow-up was available, we only used the first one. The same clinical acceptance criteria for the longitudinal data as for the single time point data were used.
2.6.5. Dynamic PET imaging
TAC extracted using the 40-min dynamic PET-data were computed for the subset of n = 23 scans where this was available. The pattern of the curves, which can be used for diagnosis and treatment response (Dunkl et al., 2015), were visually inspected similarly to previous publications (Galldiks et al., 2013, 2015b), and relative difference to the curves extracted with CT-AC were computed.
2.7. Statistical analysis
The correlation between the clinical values (SUV and TBR) extracted using atlas or DeepDixon compared to CT-AC was estimated using the R2 coefficient of determination using r2_score function in sklearn (version 1.2.1).
Descriptive statistics of the clinical metrics (TBRmean, TBRmax, and BTV) are provided as mean, 95% confidence intervals (CI), and limits of agreement. The differences were first tested for normality using Shapiro–Wilk test, with p-value < 0.05 implying data are significantly different from a normal distribution. Log-transformation was used as data was not distributed normally, and normality was validated using QQ-plots. Exponentiation was applied to the results to express the differences as ratios on the original scale and report them as percentage differences:
where n is the number of scans, d is the mean difference, and SDd is the standard deviation of the difference on the log scale, where we corrected for repeated measurements from the follow-up examinations (Bland and Altman, 1999).
Paired t-test was performed comparing the difference of the log-transformed values of TBRmean, TBRmax, and BTV. A p-value < 0.05 indicates statistically significant differences.
All statistical tests were performed using R version 4.2.1.
3. Results
DeepDixon MR-AC maps were produced for all 318 studies, with an average inference time of 4 s. Visual reading showed that 86% of the studies had no or minor not significant artifacts, with excellent representation of smaller surgical interventions (Figure 1). There were intermediate potentially influential artifacts in 11% of the examinations, most often pronounced as small signal voids at the location of a metal clip, and major artifacts in 3% of the examinations. All images from the major category are shown in Figure 2. The Dice coefficient for DeepDixon was 0.89/0.83 for tissue/bone in the whole head, compared to 0.84/0.57 for atlas. In the vicinity of the BTV, the Dice coefficient was similarly higher for DeepDixon (0.95/0.84) compared to atlas (0.91/0.60) (Supplementary Figure 2).
Figure 1. Examples of magnetic resonance imaging (MRI) and corresponding attenuation maps for two post-surgery patients. First patient (A) has titanium alloy insert, which shows up has a thickening of tissue on DeepDixon, and not at all in the atlas method. Second patient (B) had part of the skull removed in the parietal region, which is well represented by DeepDixon. Particularly the facial and nasal regions are challenged in the atlas method leading to quantitative errors in brain directly bordering the skull base, anterior and posterior fossa, inferior temporal lobes, mesencephalon, pons, and cerebellum.
Figure 2. Magnetic resonance imaging (MRI) and attenuation maps for all six patients with DeepDixon artifacts categorized as major. Two patients had follow-up scans, both with consistent artifacts, resulting in a total of 8/318 scans with major artifacts. The patients in (A–C,E) had biological tumor volume (BTV) > 1 ml, and were therefore included in the positron emission tomography (PET) evaluation. The primary errors were caused by false bone formation along titanium implant/soft tissue (A,D,F), or directly on brain in postoperative pneumocephalus (B), and metal associated signal voids (C,E).
The correlation of clinical metrics between the AC methods can be assessed in Figure 3 (n = 222). atlas and DeepDixon both recovered the mean and maximum tissue activity concentration as well as TBR within the BTV nearly 100% compared to the reference CT-AC (R2 > 0.98 and > 0.99 for atlas and DeepDixon, respectively). Similar R2 values were found when only using the baseline examinations (n = 147). The difference was statistically significant between atlas and CT-AC but not between DeepDixon and CT-AC (Table 2). The same results were found when the statistical tests were performed on the original data (without log-transformation applied) as well as with only baseline examinations (no follow-up examinations, n = 147). When using DeepDixon, only 3 out of 222 studies (1%) exceeded our acceptance criteria of TBRmax difference < ± 0.1 or 5%, TBRmean ± 0.05 or 5%, and BTV ± 2 ml or 10% (Figure 4). These consisted of one patient with TBRmax difference of −0.18 or −9% (2.05 with CT-AC, 1.87 with DeepDixon) caused by metal clips at the location of the lesion and two patients with BTV differences, the worst of which is shown in Figure 2B. When using atlas 72 of the 222 studies (32%) exceeded our acceptance criteria.
Figure 3. Plot of mean (left panels) and max (right panels) values within the tumor ROI of [18F]FET-PET tissue activity concentration (top panels) and TBR (bottom panels) for atlas and DeepDixon vs. CT-AC reference standard (n = 222). The black line indicates the unity line. The goodness-of-determination (R2) was calculated including follow-up examinations.
Table 2. Mean relative differences relative to CT-AC of all investigated clinical values (n = 222 scans).
Figure 4. Bland-Altman plot of TBRmean (top), TBRmax (middle) and biological tumor volume (BTV) (bottom) for each of the two MR-AC methods against the reference standard CT-AC (n = 222 scans). To simulate the clinical impact of the metrics in evaluating reactive changes vs. tumor recurrence 3 intervals have been labeled along the x-axis for TBRmean and TBRmax. The gray shaded areas define an interval of ambiguity. The black lines indicate the acceptance criteria of TBRmean of ± 0.05 or 5%, TBRmax of ± 0.1 or 5%, or BTV of ± 2 ml or 10%, respectively. Points that exceed the criteria have been colored. The solid gray line indicates the mean value.
When using TBR to differentiate between reactive changes and tumor tissue, TBRmax < 2.0 is often considered reactive tissue whereas TBRmax > 2.4 and TBRmean ∼ > 2.0 is considered indicative of active tumor tissue. When applying these thresholds atlas, DeepDixon, and CT-AC had concordant classifications (reactive vs. tumor tissue) in all studies. Minor differences resulted in an equivocal classification (TBRmean between 1.9 and 2.1, TBRmax between 2 and 2.4) being changed to reactive or tumor tissue, or vice versa in 26 studies using atlas and 10 studies using DeepDixon. The absolute difference value range for these 26 patients using atlas to have the same category as the reference was 0–0.21 for TBRmax (n = 10) and 0.02–0.05 for TBRmean (n = 16). In comparison, the same range for DeepDixon was 0.03–0.13 for TBRmax (n = 5) and 0.01–0.06 for TBRmean (n = 5).
The tumor delineation precision was improved, from Dice coefficient of 0.90 ± 0.1 and Hausdorff distance 2.7 ± 3.5 mm using atlas, to 0.95 ± 0.1 and 1.3 ± 2.9 mm, respectively, using DeepDixon (Supplementary Figure 3). The shape deviation analysis found that only one study had distinct warps in the outline of the BTV of more than 1 ml (1.5 ml) using atlas and none using DeepDixon. The peak location of TBRmax used for biopsy guidance was in general in agreement; 91% were within 10 mm compared to CT-AC using atlas and 94% using DeepDixon. While the peak location for the remaining studies was between 10 and 115 mm from the reference, TBRmax was nearly identical in all studies when comparing reference and identified peak location, with a maximum relative difference of 4.4% for atlas (n = 25) and 3.5% for DeepDixon (n = 7).
The percent change between baseline and follow-up examinations is shown for each of the AC methods in Supplementary Table 1 for the 56 patients with at least one follow-up examination. TBRmean was congruent in all patients using either MR-AC method. For TBRmax, the difference in percent change was within our 5%-point acceptance limit for all patients with DeepDixon, but was exceeded in 8 patients using atlas (range: 5–24% or 0.1–0.7 in absolute values), however, all with the same change direction as CT-AC. The absolute percentage point difference exceeded our acceptance limit of 10%-point and 2 ml for BTV in 3 patients with DeepDixon (range: 12–42% or 2–3 ml) and 10 patients with atlas (range: 10–260% or 2–7 ml). The direction of the volume change was congruent with CT-AC when using DeepDixon, while atlas resulted in discordant direction in 5 patients, the worst being an increase of 0.6 ml (5–5.6 ml) when measured with CT-AC whereas using atlas resulted in a 3 ml decrease (9–6 ml).
The relative absolute difference for the 23 patients with 40-min dynamic PET data was 2.0 ± 1.6% from CT-AC across all patients and time-points when using DeepDixon compared to 3.4 ± 2.4% when using atlas (Figure 5), with the largest bias for a single patient and time-point of 10% for DeepDixon and 17% for atlas. All TAC for both methods followed a course parallel with CT-AC. Thus, there were no change of TAC configuration.
Figure 5. Relative absolute difference between time-activity curves (TACs) across all subjects (n = 23) to the dynamic 40-min positron emission tomography (PET) with CT-AC reference for each of DeepDixon and atlas MR-AC, respectively.
Globally, DeepDixon has an improved PSNR, SSIM, and MAE over atlas compared to CT-AC on both the attenuation map and resulting PET images (Supplementary Table 2). Global and regional relative difference evaluation is shown in Supplementary Figure 4.
4. Discussion
This study evaluated the accuracy of our deep learning-based MR-AC method for [18F]FET-PET in a large group of post-surgery glioma patients. This category of patients is a challenge to MR-AC methods due to the presence of gross anatomical deformations and metal implant-induced susceptibility artifacts. At the same time the use of well-established and recommended (semi-) quantitative metrics makes these patients ideal for a clinical evaluation.
Overall, we found that DeepDixon, despite being trained on data without irregular anatomy or metal artifacts, robustly reproduced the clinical metrics found with CT-AC. Irregular anatomy and titanium alloy clamps were accurately represented by DeepDixon in most patients, with only minor deviations that did not impact the surrounding PET tracer uptake. The vendor-provided atlas MR-AC method achieved acceptable performance on average, but patient-by-patient evaluation revealed significant outliers that might compromise accurate diagnosis in border-line cases.
Visual inspection of attenuation maps for artifacts, consistency and plausibility is always recommended in PET/MRI (Law et al., 2019). We found eight scans from six patients with major artifacts in DeepDixon that might change the tracer uptake and subsequently the clinical metrics, of which four scans had BTV > 1 ml and were therefore included in the PET evaluation (Figure 2). All metrics were within the acceptable limits in three of these patients despite the artifacts, in part due to the distance between artifact and BTV being 2, 7, and 10 cm. The final patient (Figure 2B) had an increase in BTV from 2.9 ml with CT-AC to 5.4 ml with DeepDixon due to the appearance of a 1–2 voxel elongated region above threshold in the frontal cortex (Supplementary Figure 5). This patient had frontal postsurgical pneumocephalus that was likely to have been interpreted by the AI-method as the frontal sinuses causing erroneous bone formation to be built. The same patient had a significantly larger error using atlas MR-AC with a BTV of 18.9 ml, as the air space was filled by the attenuation value of water. While visual inspection of the MR-AC maps remains of importance, considering the distance between the tumor and artifact can be helpful in estimating the introduced bias. Using atlas MR-AC metrics exceeded the acceptable limits in three of the four patients. Overall, the prevalence of patients with metrics that exceeded our acceptance criteria using DeepDixon (n = 3, 1%) is even lower than our previous findings using RESOLUTE for MR-AC, where 5 out of 68 (7%) exceeded the acceptance criteria (Ladefoged et al., 2017).
The results of the follow-up analysis showed that both MR-AC methods robustly reproduced the magnitude and direction of the change between scans, albeit with DeepDixon being the most accurate, thus leading to similar conclusion of treatment response regardless of AC method applied. This indicates that DeepDixon can replace CT-AC even in neuro oncological response assessment. In a separate analysis (results not shown) we evaluated the impact of changing to DeepDixon in the follow-up examination when using CT-AC for the baseline examination and found results that were similar to using DeepDixon for both baseline and follow-up. Low between scanner variability is important for the method reliability and practical use as [18F]FET PET/MRI and PET/CT will often be used interchangeably clinically for response assessment.
On average, the vendor-provided atlas-based MR-AC method produced clinical metrics that are comparable to the CT-AC reference, with TBRmean and TBRmax relative differences of 1–3%. These results confirm the findings of Rausch et al. (2017) in 24 patients, that found a relative difference of 0 ± 2 and 0 ± 5% between atlas MR-AC and CT-AC for TBRmean and TBRmax, respectively. When evaluated on a patient-by-patient basis, however, almost a third of our scans had clinical metrics outside the acceptable limits, which underlines the importance of evaluation on a single subject level and in a large patient group to embrace the variation in artifacts. The errors were most often caused by metal implant-induced signal voids and irregular anatomy challenging the registration accuracy, also supported by the poor Dice score for bone in areas near the BTV often affected by surgical intervention (Supplementary Figure 2).
In 6–9% of patients the peak location deviated more than 10 mm, which would impact biopsy planning. These were larger (BTV > 10 ml) and irregular tumors often close to the resection cavity, where the peak could “slide” along a ridge. The TBRmax did not change, but we have no way of assessing the consequences in underlying sampled histology. It should be noted that surgical biopsies are mostly performed under more ideal conditions in preoperative patients with more homogenous intact tumors without surgical fragmentation, postoperative treatment related changes and cranial modifications, where precision may be even higher. Postsurgical biopsies will usually not be directed toward recurrence in a resection cavity, but toward distinct, deeper seated and smaller localized suspected recurrences. Thus, although our patient group is not representative of the typical biopsy candidates, they do give an indication of performance under difficult conditions.
The performance of the atlas MR-AC method for dynamic data was also revaluated by Rausch et al. (2019). Here, the authors categorized the TACs into three categories (increasing, plateau, and decreasing) depending on the shape of the curve. The authors found a change in the TAC pattern, compared to a CT-AC reference, in one of the 17 cases they evaluated. We did not find any difference between the TAC shapes regardless of AC method in the BTV or the reference region. Direct comparison between that study and ours is challenging, due to differences in BTV delineation and number of frames, but overall our evaluation confirmed the previous findings in that the TACs are robust toward choice of MR-AC method using either atlas or DeepDixon, which is important for both diagnosis (Galldiks et al., 2015b; Ceccon et al., 2017) and prognostic prediction (Bauer et al., 2020).
We have previously evaluated our deep learning method on a post-surgery pediatric brain tumor cohort (Ladefoged et al., 2019), and concluded that it could robustly represent even irregular anatomy present in the dataset, resulting in clinical metrics on par with CT-AC. The pediatric model was trained using fourfold cross validation on 79 scans, including cases with severe abnormal anatomy. Since our dataset included cases with major artifacts (Figure 2), we attempted to apply transfer learning of the adult DeepDixon model to our current oncology dataset using fourfold cross-validation, where each hold-out fold contained 2 scans with major DeepDixon artifacts identified. We did not see any real improvement (results not shown), which likely has to do with the low prevalence of the gross artifacts.
In clinical practice a pragmatic and cost-effective strategy can be employed similar to the one we presently use in PET/MRI of our dementia patients (Ladefoged et al., 2020). The fast generation of a DeepDixon MR-AC map allows early artifact screening by technologist trained on the examples in this paper and the subsequent acquisition of low-dose CT for CT-AC in relevant patients without compromising clinical quality compared to PET/CT. We have in our unit used DeepDixon for primary brain MR-AC of [18F]FDG-PET/MR evaluations of dementia patients for more than 3 years generating over 1,500 attenuation maps. The results of this study substantiate that DeepDixon is ready for routine clinical implementation for MR-AC even under more challenging conditions such as post-surgical glioma patients.
4.1. Limitations
Our patient cohort predominantly consisted of [18F]FET-PET scans with clear indications of viable tumor tissue of which 114 (50%) had TBRmax ratio above 3.0. For these patients, large deviations do not lead to a change in clinical reading. Only 23 scans had data acquired for the full 40-min uptake period. More patients should be included to determine the impact of AC on the dynamic biomarkers. Our study does not compare the performance of DeepDixon against other atlas- or segmentation-based state-of-the-art methods.
5. Conclusion
This study is the largest study to date evaluating the performance of a state-of-the-art MR-AC method on post-surgical glioma patients scanned using simultaneously acquired [18F]FET-PET/MRI. We found that DeepDixon could overcome most of the issues arising from irregular anatomy and metal artifacts present in the cohort, resulting in clinical metrics within acceptable limits of the reference CT-AC in almost all cases, which is an improvement over the vendor-provided atlas method. Using follow-up scans we found DeepDixon to be robust in neuro oncology response assessment.
Data availability statement
The datasets presented in this article are not readily available because the data contain patient identifiable information. Requests to access the datasets should be directed to IL, ian.law@regionh.dk.
Ethics statement
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.
Author contributions
CL designed the method, did the data analysis, and prepared the manuscript. LA and CE aided in the data analysis, revised, and approved the manuscript. FA, TA, KM, LH, OH, and IL aided in the data acquisition, data analysis, revised, and approved the manuscript. All authors contributed to the article and approved the submitted version.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnins.2023.1142383/full#supplementary-material
Footnotes
References
Albert, N. L., Weller, M., Suchorska, B., Galldiks, N., Soffietti, R., Kim, M., et al. (2016). Response assessment in neuro-oncology working group and European association for neuro-oncology recommendations for the clinical use of PET imaging in gliomas. Neurol. Oncol. 18, 1199–1208. doi: 10.1093/neuonc/now058
Andersen, F. L., Ladefoged, C., Beyer, T., Keller, S., Hansen, A., Højgaard, L., et al. (2014). Combined PET/MR imaging in neurology: MR-based attenuation correction implies a strong spatial bias when ignoring bone. Neuroimage 84, 206–216. doi: 10.1016/j.neuroimage.2013.08.042
Bauer, E. K., Stoffels, G., Blau, T., Reifenberger, G., Felsberg, J., Werner, J., et al. (2020). Prediction of survival in patients with IDH-wildtype astrocytic gliomas using dynamic O-(2-[18F]-fluoroethyl)-l-tyrosine PET. Eur. J. Nucl. Med. Mol. Imaging 47, 1486–1495. doi: 10.1007/s00259-020-04695-0
Bland, J. M., and Altman, D. G. (1999). Measuring agreement in method comparison studies. Stat. Methods Med. Res. 8, 135–160. doi: 10.1177/096228029900800204
Catana, C. (2020). Attenuation correction for human PET/MRI studies. Phys. Med. Biol. 65:23TR02. doi: 10.1088/1361-6560/abb0f8
Ceccon, G., Lohmann, P., Stoffels, G., Judov, N., Filss, C., Rapp, M., et al. (2017). Dynamic O-(2-18F-fluoroethyl)-L-tyrosine positron emission tomography differentiates brain metastasis recurrence from radiation injury after radiotherapy. Neurol. Oncol. 19, 281–288. doi: 10.1093/neuonc/now149
Dickson, J. C., O’Meara, C., and Barnes, A. (2014). A comparison of CT- and MR-based attenuation correction in neurological PET. Eur. J. Nucl. Med. Mol. Imaging 41, 1176–1189. doi: 10.1007/s00259-013-2652-z
Dunkl, V., Cleff, C., Stoffels, G., Judov, N., Sarikaya-Seiwert, S., Law, I., et al. (2015). The usefulness of dynamic O-(2-[18F]fluoroethyl)-L-tyrosine-PET in the clinical evaluation of brain tumors in children and adolescents. J. Nucl. Med. 56, 88–92. doi: 10.2967/jnumed.114.148734
Ewelt, C., Floeth, F., Felsberg, J., Steiger, H., Sabel, M., Langen, K., et al. (2011). Finding the anaplastic focus in diffuse gliomas: the value of Gd-DTPA enhanced MRI, FET-PET, and intraoperative, ALA-derived tissue fluorescence. Clin. Neurol. Neurosurg. 113, 541–547. doi: 10.1016/j.clineuro.2011.03.008
Floeth, F. W., Pauleit, D., Wittsack, H., Langen, K., Reifenberger, G., Hamacher, K., et al. (2005). Multimodal metabolic imaging of cerebral gliomas: Positron emission tomography with [18F]fluoroethyl-L-tyrosine and magnetic resonance spectroscopy. J. Neurosurg. 102, 318–327. doi: 10.3171/jns.2005.102.2.0318
Fonov, V., Evans, A., McKinstry, R., Almli, C., and Collins, D. (2009). Unbiased nonlinear average age-appropriate brain templates from birth to adulthood. Neuroimage 47, S102. doi: 10.1016/S1053-8119(09)70884-5
Galldiks, N., Dunkl, V., Stoffels, G., Hutterer, M., Rapp, M., Sabel, M., et al. (2015a). Diagnosis of pseudoprogression in patients with glioblastoma using O-(2-[18F]fluoroethyl)-L-tyrosine PET. Eur. J. Nucl. Med. Mol. Imaging 42, 685–695. doi: 10.1007/s00259-014-2959-4
Galldiks, N., Stoffels, G., Filss, C., Rapp, M., Blau, T., Tscherpel, C., et al. (2015b). The use of dynamic O-(2-18F-fluoroethyl)-l-tyrosine PET in the diagnosis of patients with progressive and recurrent glioma. Neurol. Oncol. 17, 1293–1300. doi: 10.1093/neuonc/nov088
Galldiks, N., Stoffels, G., Ruge, M., Rapp, M., Sabel, M., Reifenberger, G., et al. (2013). Role of O-(2-18F-fluoroethyl)-L-tyrosine PET as a diagnostic tool for detection of malignant progression in patients with low-grade glioma. J. Nucl. Med. 54, 2046–2054. doi: 10.2967/jnumed.113.123836
Hargreaves, B. A., Worters, P. W., Pauly, K. B., Pauly, J. M., Koch, K. M., and Gold, G. E. (2011). Metal-induced artifacts in MRI. Am. J. Roentgenol. 197, 547–555. doi: 10.2214/AJR.11.7364
Holzgreve, A., Albert, N. L., Galldiks, N., and Suchorska, B. (2021). Use of PET imaging in neuro-oncological surgery. Cancers 13, 1–12. doi: 10.3390/cancers13092093
Isensee, F., Schell, M., Pflueger, I., Brugnara, G., Bonekamp, D., Neuberger, U., et al. (2019). Automated brain extraction of multisequence MRI using artificial neural networks. Hum. Brain Mapp. 40, 4952–4964. doi: 10.1002/hbm.24750
Keereman, V., Fierens, Y., Broux, T., De Deene, Y., Lonneux, M., and Vandenberghe, S. (2010). MRI-based attenuation correction for PET/MRI using ultrashort echo time sequences. J. Nucl. Med. 51, 812–818. doi: 10.2967/jnumed.109.065425
Koesters, T., Friedman, K., Fenchel, M., Zhan, Y., Hermosillo, G., Babb, J., et al. (2016). Dixon sequence with superimposed model-based bone compartment provides highly accurate PET/MR attenuation correction of the brain. J. Nucl. Med. 57, 918–924. doi: 10.2967/jnumed.115.166967
Kunz, M., Thon, N., Eigenbrod, S., Hartmann, C., Egensperger, R., Herms, J., et al. (2011). Hot spots in dynamic (18)FET-PET delineate malignant tumor parts within suspected WHO grade II gliomas. Neuro. Oncol. 13, 307–316. doi: 10.1093/neuonc/noq196
Ladefoged, C. N., Andersen, F. L., Kjær, A., Højgaard, L., and Law, I. (2017). RESOLUTE PET/MRI attenuation correction for O-(2-18F-fluoroethyl)-L-tyrosine (FET) in brain tumor patients with metal implants. Front. Neurosci. 11:453. doi: 10.3389/fnins.2017.00453
Ladefoged, C. N., Benoit, D., Law, I., Holm, S., Kjær, A., Højgaard, L., et al. (2015). Region specific optimization of continuous linear attenuation coefficients based on UTE (RESOLUTE): Application to PET/MR brain imaging. Phys. Med. Biol. 60, 8047–8065. doi: 10.1088/0031-9155/60/20/8047
Ladefoged, C. N., Hansen, A., Henriksen, O., Bruun, F., Eikenes, L., Øen, S., et al. (2020). AI-driven attenuation correction for brain PET/MRI: Clinical evaluation of a dementia cohort and importance of the training group size. Neuroimage 222:117221. doi: 10.1016/j.neuroimage.2020.117221
Ladefoged, C. N., Law, I., Anazodo, U., St Lawrence, K., Izquierdo-Garcia, D., Catana, C., et al. (2016). A multi-centre evaluation of eleven clinically feasible brain PET/MRI attenuation correction techniques using a large cohort of patients. Neuroimage 147, 346–359. doi: 10.1016/j.neuroimage.2016.12.010
Ladefoged, C. N., Marner, L., Hindsholm, A., Law, I., Højgaard, L., and Andersen, F. L. (2019). Deep learning based attenuation correction of PET/MRI in pediatric brain tumor patients: Evaluation in a clinical setting. Front. Neurosci. 12:1005. doi: 10.3389/fnins.2018.01005
Langen, K. J., Bartenstein, P., Boecker, H., Brust, P., Coenen, H., Drzezga, A., et al. (2011). German guidelines for brain tumour imaging by PET and SPECT using labelled amino acids. Nuklearmedizin 50, 167–173. doi: 10.3413/nuk-2011041
Law, I., Albert, N., Arbizu, J., Boellaard, R., Drzezga, A., and Galldiks, N. (2019). Joint EANM/EANO/RANO practice guidelines/SNMMI procedure standards for imaging of gliomas using PET with radiolabelled amino acids and [18F]FDG: version 1.0. Eur. J. Nucl. Med. Mol. Imaging 46, 540–557. doi: 10.1007/s00259-018-4207-9
Louis, D. N., Perry, A., Reifenberger, G., von Deimling, A., Figarella-Branger, D., Cavenee, W., et al. (2016). The 2016 World Health Organization classification of tumors of the central nervous system: A summary. Acta Neuropathol. 131, 803–820. doi: 10.1007/s00401-016-1545-1
Louis, D. N., Perry, A., Wesseling, P., Brat, D., Cree, I., Figarella-Branger, D., et al. (2021). The 2021 WHO classification of tumors of the central nervous system: A summary. Neuro. Oncol. 23, 1231–1251. doi: 10.1093/neuonc/noab106
Messing-Junger, A. M., Floeth, F., Pauleit, D., Reifenberger, G., Willing, R., Gärtner, J., et al. (2002). Multimodal target point assessment for stereotactic biopsy in children with diffuse bithalamic astrocytomas. Childs Nerv. Syst. 18, 445–449. doi: 10.1007/s00381-002-0644-6
Moller, S., Law, I., Munck Af Rosenschold, P., Costa, J., Poulsen, H., and Engelholm, S. (2016). Prognostic value of 18F-FET PET imaging in re-irradiation of high-grade glioma: Results of a phase I clinical trial. Radiother. Oncol. 121, 132–137. doi: 10.1016/j.radonc.2016.08.014
Mullins, M. E., Barest, G. D., Schaefer, P. W., Hochberg, F. H., Gonzalez, R. G., and Lev, M. H. (2005). Radiation necrosis versus glioma recurrence: conventional MR imaging clues to diagnosis. AJNR Am. J. Neuroradiol. 26, 1967–1972.
Paulus, D. H., Quick, H., Geppert, C., Fenchel, M., Zhan, Y., Hermosillo, G., et al. (2015). Whole-body PET/MR imaging: Quantitative evaluation of a novel model-based MR attenuation correction method including bone. J. Nucl. Med. 56, 1061–1066. doi: 10.2967/jnumed.115.156000
Pöpperl, G., Götz, C., Rachinger, W., Schnell, O., Gildehaus, F., Tonn, J., et al. (2006). Serial O-(2-[(18)F]fluoroethyl)-L-tyrosine PET for monitoring the effects of intracavitary radioimmunotherapy in patients with malignant glioma. Eur. J. Nucl. Med. Mol. Imaging 33, 792–800. doi: 10.1007/s00259-005-0053-7
Poulsen, S. H., Urup, T., Grunnet, K., Christensen, I., Larsen, V., Jensen, M., et al. (2017). The prognostic value of FET PET at radiotherapy planning in newly diagnosed glioblastoma. Eur. J. Nucl. Med. Mol. Imaging 44, 373–381. doi: 10.1007/s00259-016-3494-2
Rausch, I., Rischka, L., Ladefoged, C., Furtner, J., Fenchel, M., Hahn, A., et al. (2017). PET/MRI for Oncologic Brain Imaging: A Comparison of Standard MR-Based Attenuation Corrections with a Model-Based Approach for the Siemens mMR PET/MR System. J. Nucl. Med. 58, 1519–1525. doi: 10.2967/jnumed.116.186148
Rausch, I., Zitterl, A., Berroterán-Infante, N., Rischka, L., Prayer, D., Fenchel, M., et al. (2019). Dynamic [18F]FET-PET/MRI using standard MRI-based attenuation correction methods. Eur. Radiol. 29, 4276–4285. doi: 10.1007/s00330-018-5942-9
Sekine, T., Buck, A., Delso, G., Kemp, B., Ter Voert, E., Huellner, M., et al. (2020). The impact of atlas-based MR attenuation correction on the diagnosis of FDG-PET/MR for Alzheimer’s diseases- A simulation study combining multi-center data and ADNI-data. PLoS One 15:e0233886. doi: 10.1371/journal.pone.0233886
Sekine, T., Buck, A., Delso, G., Ter Voert, E., Huellner, M., Veit-Haibach, P., et al. (2016a). Evaluation of atlas-based attenuation correction for integrated PET/MR in human brain: Application of a head atlas and comparison to true CT-based attenuation correction. J. Nucl. Med. 57, 215–220. doi: 10.2967/jnumed.115.159228
Sekine, T., Ter Voert, E., Warnock, G., Buck, A., Huellner, M., Veit-Haibach, P., et al. (2016b). Clinical evaluation of zero-echo-time attenuation correction for brain 18F-FDG PET/MRI: Comparison with atlas attenuation correction. J. Nucl. Med. 57, 1927–1932. doi: 10.2967/jnumed.116.175398
Smits, M. (2021). MRI biomarkers in neuro-oncology. Nat. Rev. Neurol. 17, 486–500. doi: 10.1038/s41582-021-00510-y
Suchorska, B., Albert, N. L., and Tonn, J. C. (2016). Usefulness of PET Imaging to Guide Treatment Options in Gliomas. Curr. Treat. Options Neurol. 18, 1–11. doi: 10.1007/s11940-015-0384-z
Teuho, J., Torrado-Carvajal, A., Herzog, H., Anazodo, U., Klén, R., Iida, H., et al. (2020). Magnetic Resonance-Based Attenuation Correction and Scatter Correction in Neurological Positron Emission Tomography/Magnetic Resonance Imaging—Current Status With Emerging Applications. Front. Phys. 7:243. doi: 10.3389/fphy.2019.00243
Torrado-Carvajal, A. (2020). Importance of attenuation correction in PET/MR image quantification: Methods and applications. Rev. Esp. Med. Nucl. Imagen. Mol. 39, 163–168. doi: 10.1016/j.remnie.2020.03.002
Vander Borght, T., Asenbaum, S., Bartenstein, P., Halldin, C., Kapucu, O., Van Laere, K., et al. (2006). EANM procedure guidelines for brain tumour imaging using labelled amino acid analogues. Eur. J. Nucl. Med. Mol. Imaging 33, 1374–1380. doi: 10.1007/s00259-006-0206-3
Wiesinger, F., Sacolick, L., Menini, A., Kaushik, S., Ahn, S., Veit-Haibach, P., et al. (2016). Zero TE MR bone imaging in the head. Magn. Reson. Med. 75, 107–114. doi: 10.1002/mrm.25545
Keywords: AI, attenuation correction, deep learning, DeepDixon, glioma, post-surgery, PET/MRI
Citation: Ladefoged CN, Andersen FL, Andersen TL, Anderberg L, Engkebølle C, Madsen K, Højgaard L, Henriksen OM and Law I (2023) DeepDixon synthetic CT for [18F]FET PET/MRI attenuation correction of post-surgery glioma patients with metal implants. Front. Neurosci. 17:1142383. doi: 10.3389/fnins.2023.1142383
Received: 11 January 2023; Accepted: 08 March 2023;
Published: 06 April 2023.
Edited by:
Yilong Ma, Feinstein Institute for Medical Research, United StatesReviewed by:
Arastoo Vossough, Children’s Hospital of Philadelphia, United StatesPetra Tomse, Ljubljana University Medical Centre, Slovenia
Copyright © 2023 Ladefoged, Andersen, Andersen, Anderberg, Engkebølle, Madsen, Højgaard, Henriksen and Law. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Claes Nøhr Ladefoged, claes.noehr.ladefoged@regionh.dk