- Department of Psychiatry and Psychotherapy, Medical University of Vienna, Vienna, Austria
The accurate segmentation of in vivo magnetic resonance imaging (MRI) data is a crucial prerequisite for the reliable assessment of disease progression, patient stratification or the establishment of putative imaging biomarkers. This is especially important for the hippocampal formation, a brain area involved in memory formation and often affected by neurodegenerative or psychiatric diseases. FreeSurfer, a widely used automated segmentation software, offers hippocampal subfield delineation with multiple input options. While a single T1-weighted (T1) sequence is regularly used by most studies, it is also possible and advised to use a high-resolution T2-weighted (T2H) sequence or multispectral information. In this investigation it was determined whether there are differences in volume estimations depending on the input images and which combination of these deliver the most reliable results in each hippocampal subfield. 41 healthy participants (age = 25.2 years ± 4.2 SD) underwent two structural MRIs at three Tesla (time between scans: 23 days ± 11 SD) using three different structural MRI sequences, to test five different input configurations (T1, T2, T2H, T1 and T2, and T1 and T2H). We compared the different processing pipelines in a cross-sectional manner and assessed reliability using test-retest variability (%TRV) and the dice coefficient. Our analyses showed pronounced significant differences and large effect sizes between the processing pipelines in several subfields, such as the molecular layer (head), CA1 (head), hippocampal fissure, CA3 (head and body), fimbria and CA4 (head). The longitudinal analysis revealed that T1 and multispectral analysis (T1 and T2H) showed overall higher reliability across all subfields than T2H alone. However, the specific subfields had a substantial influence on the performance of segmentation results, regardless of the processing pipeline. Although T1 showed good test-retest metrics, results must be interpreted with caution, as a standard T1 sequence relies heavily on prior information of the atlas and does not take the actual fine structures of the hippocampus into account. For the most accurate segmentation, we advise the use of multispectral information by using a combination of T1 and high-resolution T2-weighted sequences or a T2 high-resolution sequence alone.
Introduction
Following the seminal findings from Scoville and Milner studying the “patient H. M.” (Scoville and Milner, 1957) the hippocampus has become one of the most investigated brain regions related to memory processing (Bird and Burgess, 2008), learning (Brasted et al., 2003), or spatial navigation (O’Keefe and Nadel, 1978; O’Keefe et al., 1998). It is one of the few brain regions where adult neurogenesis occurs (Eriksson et al., 1998; Toda et al., 2019) and is highly susceptible to actions related to neuroplasticity (MacQueen and Frodl, 2011; Kraus et al., 2017). On the contrary, for example, it is one of the first brain structures affected in dementia by the accumulation of neurofibrillary tangles (Braak and Braak, 1991) and highly vulnerable to chronic stress such as in psychiatric conditions (Geuze et al., 2005), as repeatedly demonstrated in major depressive disorder (Frodl et al., 2002; Campbell et al., 2004; Videbech and Ravnkilde, 2004; Arnone et al., 2013; Schmaal et al., 2016). Interestingly, therapeutic intervention seems to restore gray matter configurations back to regular levels (Sartorius et al., 2016).
The hippocampus is not a homogenous brain structure, as it consists of distinct subfields, with specific cell properties which are functionally segregated (Duvernoy et al., 2013) as reflected in the trisynaptic circuit (Samuels et al., 2015). Input from the entorhinal cortex enters granule cells in dentate gyrus over the perforant pathway. Mossy fibers from the dentate gyrus project to CA3 pyramidal cells, while CA3 neurons send their information to pyramidal cells of CA1 via the Schaffer collaterals, where information is sent back to the subiculum and the entorhinal cortex (Andersen et al., 2006; Stepan et al., 2015). It has been shown that the dentate gyrus is involved in pattern separation, the CA3 in pattern completion, CA1 in input integration and the subiculum in memory retrieval (Zeineh et al., 2003; Lee et al., 2004; Leutgeb et al., 2004; Bakker et al., 2008; Small et al., 2011). Each subfield is specifically affected by certain diseases, as outlined in Small et al. (2011). For example, while in Alzheimer’s disease (AD) the entorhinal cortex and to some extent CA1 and the subiculum are most affected, in depression predominantly the subiculum and to some extent the CA1 were most susceptible. Interestingly, the dentate gyrus seems to be largely unaffected by AD. The same is seen in temporal lobe epilepsy (TLE) with mesial temporal sclerosis (TLE-MTS) where an overall decline in the subfields is observed, but not in the subiculum, which is quite different to the pattern seen in AD or depression (West et al., 1994; Gomez-Isla et al., 1996; Posener et al., 2003; Ballmaier et al., 2008; Mueller et al., 2009; Small, 2014). These findings grant valuable information to better monitor disease progression, onset and also for the putative development of biomarkers or prognosis for treatment outcome, specifically tailored for the respective disease. Therefore, reliable assessment of these hippocampal substructures and high reliability of processing strategies are of utmost importance for human in vivo neuroscientific investigations.
Fast progress has been made in the development of automated hippocampal segmentation methods within different software packages enabling distinct subfield segmentations (Pipitone et al., 2014; Iglesias et al., 2015; Yushkevich et al., 2015). Currently, the FreeSurfer software1 with its dedicated hippocampal subfield tool is most frequently used (Iglesias et al., 2015). Several studies have already applied this approach to their research in different contexts, for recent examples see Gryglewski et al. (2019); Kraus et al. (2019); Dounavi et al. (2020); and van Eijk et al. (2020). The latest hippocampus segmentation tool, now part of the FreeSurfer 7 release, uses a probabilistic atlas built from ex vivo magnetic resonance imaging (MRI) data recorded at 0.13 mm isotropic resolution from 15 autopsy brains and in vivo information. The in vivo data recorded at standard resolution was used to account for neighboring structures of the hippocampus. A generative framework is used to handle MRI data with different contrast properties, hence either mono- or multispectral information can be taken as input. The final estimation of the hippocampal subfield volumes is carried out by using a Bayesian inference approach (Iglesias et al., 2015).
Usually a T1-weighted sequence is used for whole-brain image analysis techniques such as voxel-based morphometry or cortical thickness assessments (Hutton et al., 2009). However, due to the complex structure of the hippocampus and its composition of different cell compartments, high resolution T2 images have been shown to deliver better and more suitable contrast properties for hippocampal subfield delineation (Winterburn et al., 2013). This has been corroborated by a recent study where T2-high resolution scans outperformed T1 images in terms of disease status detection (Mueller et al., 2018). Furthermore, some hippocampal regions, such as the molecular layer, fimbria or the fissure show low test-retest reliability or cannot be delineated based on the contrast properties of a T1 sequence alone (Whelan et al., 2016; Brown et al., 2020) while it should deliver better results using a high-resolution T2-weighted sequence (Iglesias et al., 2015).
The hippocampus tool in FreeSurfer offers the possibility to use a single (e.g., T1) or multispectral input (e.g., T1 and T2H) for delineation of the hippocampus. Despite the putative benefits of T2H and multispectral processing, some issues must be considered. First, recording an additional structural sequence is time consuming. This is especially relevant for clinical settings where patients are measured. However, this is mandatory in FreeSurfer, as all subjects have to be processed first with the regular T1-weighted recon-all stream prior to hippocampal subfield analysis. This requires at least two sequences to be recorded, if the subfield tool is used with an additional scan, or a scan different than T1-weighted. Secondly, high resolution T2-weighted sequence parameters need technical expertise, which is not available in all laboratories. In addition, the correct application of the T2H field of view (FOV) prior the measurement is also crucial and has to be performed precisely, as the sequences FOV barely covers the hippocampal structure along the main axis, due to scanner restraints imposed by the high-resolution sequence.
To investigate if the effort or drawbacks justify the gained improved signal quality, we conducted a systematic comparison of the different processing modes available within FreeSurfer. All participants were measured twice in a longitudinal setting with three different MR sequences at each time point (TP) (T1, T2 and high-resolution T2). We assessed five possible input configurations available within the recently released FreeSurfer 7: Whole-brain T1-weighted (T1), whole-brain T2-weighted (T2), T2-weighted high-resolution for hippocampus only (T2H) and combination via multispectral analysis of T1 and T2 and T1 and T2H were compared cross-sectionally. Within the same population of subjects, we assessed if these different combinations of sequences deliver different volume estimations for each subfield. Subsequently, test-retest analyses were performed within the same subjects for all subfields within FreeSurfer to assess which sequence or sequence combination delivers the most reliable values.
Materials and Methods
Subjects and Study Design
41 right-handed healthy participants (age = 25.2 years ± 4.2 SD, 30 females) were included in this investigation. All subjects underwent two structural MRI measurements approximately 3 weeks apart (23 days ± 11 SD). Screening for general health was carried out prior to study inclusion and comprised medical history assessment, a physical examination and the structured clinical interview for DSM-IV (SCID) to rule out physical and mental disorders. Exclusion criteria comprised any medical, psychiatric or neurological illness, current or former substance abuse, MRI contraindications, pregnancy, first degree relatives with a history of psychiatric illness and smoking. Recruitment was conducted through flyers at the Department of Psychiatry and Psychotherapy at the Medical University of Vienna. This study was approved by the ethical committee of the Medical University of Vienna and was performed in accordance with the Declaration of Helsinki (1964). All participants gave written informed consent to participate in this study. Data is taken from a study registered at clinicaltrials.gov with the identifier NCT02753738.
Data Acquisition
Structural MRI scans were recorded with a 3T Siemens Prisma scanner using a 64-channel head coil and the following three sequences: a whole-brain T1-weighted scan [Repetition time (TR) = 2,300 ms; echo time (TE) = 2.95 ms; inversion time (TI) = 900 ms; flip angle (α) = 9°; matrix = 240 × 256, 176 slices; 1.05 × 1.05 × 1.20 mm3; acquisition time (TA) = 5:09 min], a whole-brain T2-weighted scan (TR = 3,200 ms; TE = 408 ms; α = 120°; matrix = 256 × 256, 192 slices; 0.9 × 0.9 × 0.9 mm3; TA = 5:01 min) and a high resolution T2-weighted scan covering both hippocampi (TR = 8,000 ms; TE = 51 ms; matrix = 448 × 448, 40 slices; α = 133°; 0.39 × 0.39 × 1.20 mm3; TA = 7:52).
Data Processing
After a visual quality check of the MRI data, subjects were initially processed with the FreeSurfer 6.0 (see text footnote 1) “recon-all” standard pipeline (Dale et al., 1999; Fischl et al., 1999) for the cross-sectional comparison. In general, Talairach registration (Talairach and Tournoux, 1988), correction for bias field and skull stripping (Ségonne et al., 2004) is performed. This is followed by segmentation of white and gray matter areas (Fischl et al., 2002, 2004) and calculation of white and pial surfaces (Fischl and Dale, 2000).
In addition, all subjects were processed with the longitudinal recon-all stream (Reuter et al., 2012) for the assessment of the test-retest metrics. Here, a subject specific template is created of the two TP using robust, inverse consistent registration (Reuter et al., 2010). Information from this within-subject template is then utilized for the initialization of further processing steps (Reuter et al., 2012). For applications and a detailed description of both “recon-all” processing pipelines please see prior publications by our group (Seiger et al., 2016, 2018).
This was followed by the new hippocampal subfield segmentation approach. In this investigation, the hippocampal tool from the development version (20191217) was used, which is now available in FreeSurfer 7 (Iglesias et al., 2015). This tool segments the different subfields by using a Bayesian inference approach based on image intensities and prior knowledge of a probabilistic atlas which was generated of in vivo manual segmentations and ultra-high resolution ex vivo MRI data (Van Leemput, 2009; Iglesias et al., 2016). Subsequently, subfield volumes were calculated using five different input configurations. First, the standard T1 image was used, followed by a solely usage of the high-resolution T2 (T2H) and the T2 only scan. In addition, multispectral analysis was performed by calculating the subfields using information by combining T1 and T2H and T1 and T2. Finally, 22 regions of interest (ROIs) (19 subfields with head and body subdivisions and the whole hippocampus with head and body subdivisions) per hemisphere were extracted. After processing, data of all subjects were visually inspected to check for putative misclassifications or processing errors in general. After our inspection, no processing errors were detected and all data could be used for subsequent analyses. Detailed processing steps are depicted in Figure 1.
Figure 1. (A) Three different MRI sequences (T1: T1-weighted, T2H: T2-weighted high-resolution and T2: T2-weighted standard resolution) were recorded for each subject at baseline and after approximately 3 weeks. (B) Depiction of the processing scheme. The T1-weighted sequence is used for the standard pipeline within FreeSurfer. All data was subsequently processed with the longitudinal stream. Hippocampus segmentation was then performed with the five different input configurations using the cross-sectionally as well as the longitudinally processed data to conduct the cross-sectional comparison and the test-retest analysis. (C) Representative hippocampal segmentation of a study participant using the high-resolution T2-weighted sequence.
Statistical Analysis
Statistical analyses were carried out with the R software (R Core Team, 2019) and MATLAB R2014a (The MathWorks, Natick, MA, United States). To assess significant differences and effect sizes between the five processing types (T1, T2, T2H, T1 and T2, and T1 and T2H), cross-sectionally processed subfields were analyzed using non-parametric Friedman tests with Kendall’s W. Pairwise Wilcoxon signed-rank tests with Bonferroni correction were further used for post hoc analyses. All these tests were performed for each of the 22 ROIs. For test-retest performance, percentage test-retest variability (%TRV):
and Dice coefficients:
were calculated using the longitudinally processed “recon-all” data of the two TP within FreeSurfer.
Results
The Friedman tests with Wilcoxon post hoc tests conducted for the cross-sectional analysis comprising the five different processing pipelines (T1, T2, T2H, T1 and T2, and T1 and T2H) revealed vast significant differences in several subfields between the input configurations. The greatest volume differences between processing types in terms of effect sizes were observed in the head of the molecular layer, head of CA1, hippocampal fissure, head and body of CA3, fimbria and head of CA4 (for detailed results of Friedman tests, post hoc analyses and boxplots see Figure 2 and Table 1). Further analysis of these regions showed subfield specific differences regarding the mode of processing. For example, while T2H led to lowest volume estimations in the head of the molecular layer (192.11 ± 24.74; mean ± SD, T1: 349.80 ± 34.61), T2 showed highest values in the hippocampal fissure (180.88 ± 21.98, T1: 140.26 ± 19.97) and was significantly different to the other processing modes within this subfield. However, as for the molecular layer, all processing modes strongly differed from each other in this subfield. The CA3 body also showed lowest values for T2H (68.89 ± 9.87, T1: 86.45 ± 12.22), similar to the head of the molecular layer. Furthermore, significant differences for the whole hippocampus were observed, with highest values for T1 in comparison to the other processing types (3,588.62 ± 336.41) (see Figure 2).
Figure 2. Boxplots showing volume estimations of the cross-sectional hippocampal subfield investigation using five different input configurations (T1, T1 and T2, T1 and T2H, T2, and T2H). Subfields are arranged according to the height of effect sizes of the Friedman test (X2) using Kendall’s W. In addition to the 19 ROIs, whole hippocampal head and body as well as whole hippocampal volume are presented. All subfields showed significant differences according to the Friedman tests (see Table 1). T2H was excluded for the hippocampal tail, as not the entire structure was covered due to the limited size of the field of view. T2, T2-weighted standard resolution; T2H, T2-weighted high resolution; GC-ML-DG, granule cell and molecular layer of the dentate gyrus; HATA, hippocampus-amygdala-transition-area.
Table 1. Friedman tests with pairwise Wilcoxon post hoc comparisons. Subfields are arranged according to the height of effect sizes of the Friedman test (X2) using Kendall’s W.
The test-retest metrics indicated best %TRV results (Figure 3A) across all subfields for T1 (3.24 ± 1.33) and T1 and T2H (3.30 ± 1.13), followed by T2H (3.47 ± 1.60). Higher variability was found for T1 and T2 (4.60 ± 1.61) and T2 alone (5.14 ± 2.01). However, these observed values differed drastically between the investigated ROIs and each area showed their own specific profile. For example, while T2H alone performed better or at least as good as T1 and T2H in several subfields, poor results were found in the presubiculum head (T2H: 6.94 ± 4.58, for comparison: T1 and T2H: 3.85 ± 2.57). On the other hand, T2 and the combination of T1 and T2 showed worst performance measures in almost all subfields. Especially weak %TRV results for T2 were found in the fimbria (9.24 ± 5.89), the presubiculum body (8.68 ± 6.33) and in the parasubiculum (6.20 ± 4.31).
Figure 3. Longitudinal test-retest performance measurements. (A) Test-retest variability in percent and (B) dice coefficient metrics for each subfield and processing mode. T2H was excluded for the hippocampal tail, as not the entire structure was covered due to the limited size of the field of view. Error bars indicate standard error of the mean. T2, T2-weighted standard resolution; T2H, T2-weighted high resolution; GC-ML-DG, granule cell and molecular layer of the dentate gyrus; HATA, Hippocampus-amygdala-transition-area.
These results mainly coincided with dice similarity coefficient (Figure 3B), where best metrics were found for T1 (0.81 ± 0.09) followed by T1 and T2H (0.77 ± 0.12). Slightly inferior, but almost identical results were observed for T2H (0.76 ± 0.10) and T1 and T2 (0.76 ± 0.09). As for the %TRV results, T2 performed not as good as the other approaches (0.74 ± 0.09). However, differences to other pipelines, except for T1, were not severe. Again, results varied strongly across the specific subfields. While in some regions almost no differences were observed between the processing modes, such as in CA4, T1 clearly showed better dice coefficients in contrast to all other approaches but also especially to the overall second best approach in the molecular layer body (T1: 0.78 ± 0.05; T1 and T2H: 0.67 ± 0.03), molecular layer head (T1: 0.77 ± 0.05; T1 and T2H: 0.59 ± 0.03), parasubiculum (T1: 0.83 ± 0.05; T1 and T2H: 0.78 ± 0.04) and in the presubiculum body (T1: 0.89 ± 0.03; T1 and T2H: 0.83 ± 0.03) and head (T1: 0.86 ± 0.03; T1 and T2H: 0.80 ± 0.04) for example. All results are presented with averaged left and right mean values of both hemispheres. In addition, to gain the high resolution for the T2H condition the FOV was economically chosen and for some participants the hippocampal tail was not entirely covered. Hence, this area was not included for the T2H condition in the summary statistics described above.
Discussion
In this investigation, five different hippocampal subfield processing configurations were assessed and compared in a cross-sectional and longitudinal manner. Our results showed significant volume estimation differences between the used modes (T1, T2, T2H, T1 and T2, and T1 and T2H) in several subfields when compared cross-sectionally. Differences were most pronounced in the molecular layer (head), CA1 (head), hippocampal fissure, CA3 (head and body), fimbria and CA4 (head). In some of those areas, volume estimations between the processing types differed drastically, particularly in the head of the molecular layer with significant results between all pairwise comparisons, except for T1 vs. T1 and T2. Our results indicate a strong influence of the chosen pipeline on hippocampal subfield segmentation volume estimations.
The longitudinal analysis using %TRV and dice coefficient measurements revealed that T1 and multispectral analysis (T1 and T2H) showed better performance than T2H alone when all subfields are taken into consideration. However, the specific subfields had a substantial influence on the performance of segmentation results, regardless of the processing mode. For example, CA1, CA4, hippocampal tail (note that T2H was excluded from this region) and subiculum delivered excellent test-retest metrics for %TRV and dice coefficient measurements across the processing modes as observed in Whelan et al. (2016). Nevertheless, as observed in the cross-sectional investigation, subfield specific differences regarding the processing modes are highly apparent. The lowest test-retest performances were observed in the hippocampal fissure and the fimbria across all possible input variations, corroborating results from prior studies, where unispectral T1-weighted input at a standard resolution of around 1 mm3 had been used (Marizzoni et al., 2015; Whelan et al., 2016; Worker et al., 2018; Brown et al., 2020). In general, the volume estimations in these subfields must be interpreted with caution, as especially small hippocampal regions are harder to detect by the segmentation algorithm. It has been shown that larger hippocampal structures, such as the CA1, lead to more robust results (Marizzoni et al., 2015) also in comparison to manual delineations (Van Leemput et al., 2009). Our analyses suggest that even high-resolution T2 and the combination of T1 and T2H face difficulties in these smaller regions. Nevertheless, T2H exhibits better overall contrast properties to even detect subtle differences between the hippocampal structures, which cannot be accomplished with standard T1 resolution (Wisse et al., 2014). This was also corroborated by a recent study, indicating that high resolution T2 outperforms T1 in detecting atrophy in terms of effect sizes (Mueller et al., 2018). However, we could not detect better performance for high-resolution T2 in our reliability analysis when overall performance across all subfields was investigated. T2 and T1 and T2 showed the overall worst reliability measures, but especially in the fimbria, HATA, parasubiculum, and the presubiculum compared to the other options. In general, our results indicate no benefit in using either the standard resolution T2 sequence nor the combination of T1 and T2 compared to the default T1 processing stream.
Although the T1-weighted sequence with standard resolution of 1 mm3 delivered overall better test-retest metrics than T2H and T1 and T2H, several hippocampal substructures are only reliably detected using high resolution T2 or multispectral contrasts (T1 and T2H). Therefore, the gained segmentation results should be interpreted with caution, as results do not always reflect the underlying structures of the hippocampus (Wisse et al., 2020). In our analysis, an interesting observation was made for the head and the body of the molecular layer, where T1 showed best results for both test-retest metrics in comparison to all the other modes. A possible explanation why the test-retest results are fairly good in this region, is the fact that the algorithm relies heavily on prior information of the atlas when only the T1 sequence is used (Iglesias et al., 2015). Using the T1 standard resolution, the internal boundaries are not reliably detected and rely heavy on prior information of the atlas. This is especially true for the molecular layer, which cannot be detected reliable and relies on prior information (Iglesias et al., 2015; Giuliano et al., 2017). In addition, partial volume effects and signal variations have also be taken into consideration in the hippocampus, especially at such small substructures (Tohka, 2014; Worker et al., 2018). For the whole hippocampus, slightly better results were observed for T1 and T2H in comparison to T1 regarding %TRV.
FreeSurfer was used in this investigation, as it is freely available and widely used for brain segmentations including subfield parcellation of several subcortical structures. However, next to FreeSurfer, other hippocampal subfield segmentation tools exist while a recently published approach (LASHiS) seems to be a reasonable alternative. Especially at ultra-high fields, as it specifically supports longitudinal multispectral processing (Shaw et al., 2020). This is a drawback for FreeSurfer that longitudinal hippocampal processing is only possible using a T1-weighted image and not available for multispectral contrast inputs. This should be addressed in future releases of this software package as it was recently shown that the longitudinal approach outperformed cross-sectional hippocampal processing (Chiappiniello et al., 2020). In this investigation, authors also used a multispectral approach, however, focusing on the recon-all stream and not directly on the hippocampal subfield tool, as we did in our analysis. Furthermore, it is a vivid and ongoing debate how hippocampal subfield borders are defined and based on which criteria borders are delineated. No unified segmentation scheme is used by the scientific community. This is also problematic when several subfield tools are compared to each other or to postmortem measurements, as borders are defined according to different protocols. However, efforts are made by the Hippocampal Subfields Group (HSG) to unify the protocols and to develop a standardized method (Olsen et al., 2019). In addition, integrating cytoarchitecture, neuroreceptor information, and connectivity-based parcellations will deliver a more profound picture of this very homogenous brain structure (Plachti et al., 2019; Palomero-Gallagher et al., 2020).
If time is a limiting factor, acquiring only a T1 and running the parcellation with this sequence is a viable option, which might be even beneficial in certain subfields. However, our results indicate that one needs to be aware that the type of input images drastically changes the output. Regarding the reliability, T1 with standard resolution outperformed other sequences in distinct subfields, however, implicating the risk that results are biased, as mainly a priori information of the atlas is used (Iglesias et al., 2015).
Of note, given the small FOV of the high-resolution T2 sequence, in some of our subjects, the hippocampal tail was not entirely covered. Hence, we accounted for that fact and did not include T2H in the tail subfield. This is an issue one should be aware of as this may happen at those sequences with small FOVs to gain higher resolution. Here, no manual segmentation has been carried out in addition to the automatic assessment. Manual delineation is highly time consuming and especially in large datasets not an option. In addition, expertise of anatomy is needed and rater bias plays a role leading to problems of reproducibility across different centers (Wisse et al., 2016; Mueller et al., 2018).
Taken together, here we delivered a systematic comparison of available hippocampal processing input sets within the new FreeSurfer tool and assessed their performance using healthy young individuals. Future work may also investigate the performance in older cohorts or in patients with neurological conditions. Although T1 alone showed reliable results for the test-retest measurements, we advise to use high resolution T2 or multispectral information where T1 and high-resolution T2 is combined as it better reflects the underlying biological substrate by using high resolution and improved contrast properties.
Conclusion
In this study, we measured a relatively large study cohort of 41 participants with three different MRI sequences (T1-, T2- and high-resolution T2-weighted) to assess the performance of five hippocampal segmentation modes within FreeSurfer. Our results revealed strong subfield volume estimation differences between the used pipelines, which has to be taken into account when segmentation results are compared between studies, where different approaches have been used. The greatest differences according to effect sizes were observed in the head of the molecular layer, CA1 head, hippocampal fissure, head and body of CA3 and fimbria. Our reliability analysis indicated overall good results for T1, T1 and T2H, and T2H. However, the usage of T1 at standard resolution relies heavily on prior information of the atlas and hardly reflects the underlying neurobiological complex structure of the hippocampus. Finally, and as expected, T2 or the use of multispectral T1 and T2 does not bring any beneficial effect and showed worst test-retest results. These findings are of particular importance when comparing results of previous studies using different segmentation schemes and once again call for detailed reports on data acquisition and processing, as well as a unified state-of-the-art approach.
Data Availability Statement
The datasets presented in this article are not readily available because raw MRI data of participants used in this manuscript cannot to be shared due to ethical reasons. However, analyzed data sets are available. Requests to access the datasets should be directed to RL, cnVwZXJ0LmxhbnplbmJlcmdlckBtZWR1bml3aWVuLmFjLmF0.
Ethics Statement
The studies involving human participants were reviewed and approved by Ethical committee of the Medical University of Vienna. The patients/participants provided their written informed consent to participate in this study.
Author Contributions
RS conducted the analyses, performed MR measurements, and wrote the manuscript. GMG, PH, JU, GG, and TV were responsible for the medical aspects of this study. FH, MR, BS-D, and MK were involved in data analyses and/or conducted the MR measurements. RL supervised the entire procedures and served as principal investigator. All authors read and commented on the manuscript and gave approval for publication in its current form.
Funding
This work was supported by the Austrian Science Fund (FWF) grant number KLI 516 to RL, the Medical Imaging Cluster of the Medical University of Vienna, and by the grant “Interdisciplinary translational brain research cluster (ITHC) with highfield MR” from the Federal Ministry of Science, Research and Economy (BMWFW), Austria. RS received funding from the Hochschuljubilaeumsstiftung of the City of Vienna. MK and MR are recipients of a DOC Fellowship of the Austrian Academy of Sciences (OeAW).
Conflict of Interest
With no relevance to this work, RL received travel grants and/or conference speaker honoraria within the last 3 years from Bruker BioSpin MR, Heel, and support from Siemens Healthcare regarding clinical research using PET/MR. RL is a shareholder of the start-up company BM Health GmbH since 2019.
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s Note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Acknowledgments
We thank E. Sittenberger and V. Ritter for administrative support, A. Basaran, M. Hienert, A. Kautzky, S. Kasper, and P. Michenthaler for clinical support and A. Hahn and M. Murgas for technical support. We further thank C. Kraus for preparation of an early draft of the general study proposal and the students of the Neuroimaging Lab (NIL) at the Department of Psychiatry and Psychotherapy for general study support.
Footnotes
References
Andersen, P., Morris, R., Amaral, D., Bliss, T., and O’Keefe, J. (2006). The Hippocampus Book. Oxford: Oxford University Press.
Arnone, D., Mckie, S., Elliott, R., Juhasz, G., Thomas, E. J., Downey, D., et al. (2013). State-dependent changes in hippocampal grey matter in depression. Mol. Psychiatry 18, 1265–1272. doi: 10.1038/mp.2012.150
Bakker, A., Kirwan, C. B., Miller, M., and Stark, C. E. L. (2008). Pattern separation in the human hippocampal CA3 and dentate gyrus. Science 319, 1640–1642. doi: 10.1126/science.1152882
Ballmaier, M., Narr, K. L., Toga, A. W., Elderkin-Thompson, V., Thompson, P. M., Hamilton, L., et al. (2008). Hippocampal morphology and distinguishing late-onset from early-onset elderly depression. Am. J. Psychiatry 165, 229–237. doi: 10.1176/appi.ajp.2007.07030506
Bird, C. M., and Burgess, N. (2008). The hippocampus and memory: insights from spatial processing. Nat. Rev. Neurosci. 9, 182–194. doi: 10.1038/nrn2335
Braak, H., and Braak, E. (1991). Neuropathological stageing of Alzheimer-related changes. Acta Neuropathol. 82, 239–259. doi: 10.1007/BF00308809
Brasted, P. J., Bussey, T. J., Murray, E. A., and Wise, S. P. (2003). Role of the hippocampal system in associative learning beyond the spatial domain. Brain 126, 1202–1223. doi: 10.1093/brain/awg103
Brown, E. M., Pierce, M. E., Clark, D. C., Fischl, B. R., Iglesias, J. E., Milberg, W. P., et al. (2020). Test-retest reliability of freesurfer automated hippocampal subfield segmentation within and across scanners. Neuroimage 210:116563. doi: 10.1016/j.neuroimage.2020.116563
Campbell, S., Marriott, M., Nahmias, C., and MacQueen, G. M. (2004). Lower hippocampal volume in patients suffering from depression: a meta-analysis. Am. J. Psychiatry 161, 598–607. doi: 10.1176/appi.ajp.161.4.598
Chiappiniello, A., Tarducci, R., Muscio, C., Bruzzone, M. G., Bozzali, M., Tiraboschi, P., et al. (2020). Automatic multispectral MRI segmentation of human hippocampal subfields: an evaluation of multicentric test–retest reproducibility. Brain Struct. Funct. 226, 137–150. doi: 10.1007/s00429-020-02172-w
Dale, A. M., Fischl, B., and Sereno, M. I. (1999). Cortical surface-based analysis. Neuroimage 9, 179–194.
Dounavi, M.-E., Mak, E., Wells, K., Ritchie, K., Ritchie, C. W., Su, L., et al. (2020). Volumetric alterations in the hippocampal subfields of subjects at increased risk of dementia. Neurobiol. Aging 91, 36–44. doi: 10.1016/j.neurobiolaging.2020.03.006
Duvernoy, H. M., Cattin, F., and Risold, P.-Y. (2013). The Human Hippocampus. Berlin: Springer Berlin Heidelberg.
Eriksson, P. S., Perfilieva, E., Bjork-Eriksson, T., Alborn, A.-M., Nordborg, C., Peterson, D. A., et al. (1998). Neurogenesis in the adult human hippocampus. Nat. Med. 4, 1313–1317. doi: 10.1038/3305
Fischl, B., and Dale, A. M. (2000). Measuring the thickness of the human cerebral cortex from magnetic resonance images. Proc. Natl. Acad. Sci. U.S.A. 97, 11050–11055. doi: 10.1073/pnas.200033797
Fischl, B., Salat, D. H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., et al. (2002). Whole brain segmentation. Neuron 33, 341–355. doi: 10.1016/S0896-6273(02)00569-X
Fischl, B., Sereno, M. I., and Dale, A. M. (1999). Cortical surface-based analysis. Neuroimage 9, 195–207.
Fischl, B., van der Kouwe, A., Destrieux, C., Halgren, E., Segonne, F., Salat, D., et al. (2004). Automatically parcellating the human cerebral cortex. Cereb. Cortex 14, 11–22. doi: 10.1093/cercor/bhg087
Frodl, T., Meisenzahl, E. M., Zetzsche, T., Born, C., Groll, C., Jäger, M., et al. (2002). Hippocampal changes in patients with a first episode of major depression. Am. J. Psychiatry 159, 1112–1118. doi: 10.1176/appi.ajp.159.7.1112
Geuze, E., Vermetten, E., and Bremner, J. D. (2005). MR-based in vivo hippocampal volumetrics: 2. findings in neuropsychiatric disorders. Mol. Psychiatry 10, 160–184. doi: 10.1038/sj.mp.4001579
Giuliano, A., Donatelli, G., Cosottini, M., Tosetti, M., Retico, A., and Fantacci, M. E. (2017). Hippocampal subfields at ultra high field MRI: an overview of segmentation and measurement methods. Hippocampus 27, 481–494. doi: 10.1002/hipo.22717
Gomez-Isla, T., Price, J. L., McKeel, D. W. Jr., Morris, J. C., Growdon, J. H., Hyman, B. T., et al. (1996). Profound loss of layer II entorhinal cortex neurons occurs in very mild Alzheimer’s disease. J. Neurosci. 16, 4491–4500. doi: 10.1523/JNEUROSCI.16-14-04491.1996
Gryglewski, G., Baldinger-Melich, P., Seiger, R., Godbersen, G. M., Michenthaler, P., Klöbl, M., et al. (2019). Structural changes in amygdala nuclei, hippocampal subfields and cortical thickness following electroconvulsive therapy in treatment-resistant depression: longitudinal analysis. Br. J. Psychiatry 214, 159–167. doi: 10.1192/bjp.2018.224
Hutton, C., Draganski, B., Ashburner, J., and Weiskopf, N. (2009). A comparison between voxel-based cortical thickness and voxel-based morphometry in normal aging. Neuroimage 48, 371–380. doi: 10.1016/j.neuroimage.2009.06.043
Iglesias, J. E., Augustinack, J. C., Nguyen, K., Player, C. M., Player, A., Wright, M., et al. (2015). A computational atlas of the hippocampal formation using ex vivo, ultra-high resolution MRI: application to adaptive segmentation of in vivo MRI. Neuroimage 115, 117–137. doi: 10.1016/j.neuroimage.2015.04.042
Iglesias, J. E., Van Leemput, K., Augustinack, J., Insausti, R., Fischl, B., and Reuter, M. (2016). Bayesian longitudinal segmentation of hippocampal substructures in brain MRI using subject-specific atlases. Neuroimage 141, 542–555. doi: 10.1016/j.neuroimage.2016.07.020
Kraus, C., Castrén, E., Kasper, S., and Lanzenberger, R. (2017). Serotonin and neuroplasticity – links between molecular, functional and structural pathophysiology in depression. Neurosci. Biobehav. Rev. 77, 317–326. doi: 10.1016/j.neubiorev.2017.03.007
Kraus, C., Seiger, R., Pfabigan, D. M., Sladky, R., Tik, M., Paul, K., et al. (2019). Hippocampal subfields in acute and remitted depression—an ultra-high field magnetic resonance imaging study. Int. J. Neuropsychopharmacol. 22, 513–522. doi: 10.1093/ijnp/pyz030
Lee, I., Yoganarasimha, D., Rao, G., and Knierim, J. J. (2004). Comparison of population coherence of place cells in hippocampal subfields CA1 and CA3. Nature 430, 456–459. doi: 10.1038/nature02739
Leutgeb, S., Leutgeb, J. K., Treves, A., Moser, M. B., and Moser, E. I. (2004). Distinct ensemble codes in hippocampal areas CA3 and CA1. Science 305, 1295–1298. doi: 10.1126/science.1100265
MacQueen, G., and Frodl, T. (2011). The hippocampus in major depression: evidence for the convergence of the bench and bedside in psychiatric research. Mol. Psychiatry 16, 252–264. doi: 10.1038/mp.2010.80
Marizzoni, M., Antelmi, L., Bosch, B., Bartrés-Faz, D., Müller, B. W., Wiltfang, J., et al. (2015). Longitudinal reproducibility of automatically segmented hippocampal subfields: a multisite European 3T study on healthy elderly. Hum. Brain Mapp. 36, 3516–3527. doi: 10.1002/hbm.22859
Mueller, S. G., Laxer, K. D., Barakos, J., Cheong, I., Garcia, P., and Weiner, M. W. (2009). Subfield atrophy pattern in temporal lobe epilepsy with and without mesial sclerosis detected by high-resolution MRI at 4 Tesla: preliminary results. Epilepsia 50, 1474–1483. doi: 10.1111/j.1528-1167.2009.02010.x
Mueller, S. G., Yushkevich, P. A., Das, S., Wang, L., Van Leemput, K., Iglesias, J. E., et al. (2018). Systematic comparison of different techniques to measure hippocampal subfield volumes in ADNI2. NeuroImage Clin. 17, 1006–1018. doi: 10.1016/j.nicl.2017.12.036
O’Keefe, J., Burgess, N., Donnett, J. G., Jeffery, K. J., and Maguire, E. A. (1998). Place cells, navigational accuracy, and the human hippocampus. Philos. Trans. R. Soc. B Biol. Sci. 353, 1333–1340. doi: 10.1098/rstb.1998.0287
Olsen, R. K., Carr, V. A., Daugherty, A. M., La Joie, R., Amaral, R. S. C., Amunts, K., et al. (2019). Progress update from the hippocampal subfields group. Alzheimer’s Dement 11, 439–449. doi: 10.1016/j.dadm.2019.04.001
Palomero-Gallagher, N., Kedo, O., Mohlberg, H., Zilles, K., and Amunts, K. (2020). Multimodal mapping and analysis of the cyto- and receptorarchitecture of the human hippocampus. Brain Struct. Funct. 225, 881–907. doi: 10.1007/s00429-019-02022-4
Pipitone, J., Park, M. T. M., Winterburn, J., Lett, T. A., Lerch, J. P., Pruessner, J. C., et al. (2014). Multi-atlas segmentation of the whole hippocampus and subfields using multiple automatically generated templates. Neuroimage 101, 494–512. doi: 10.1016/j.neuroimage.2014.04.054
Plachti, A., Eickhoff, S. B., Hoffstaedter, F., Patil, K. R., Laird, A. R., Fox, P. T., et al. (2019). Multimodal parcellations and extensive behavioral profiling tackling the hippocampus gradient. Cereb. Cortex 29, 4595–4612. doi: 10.1093/cercor/bhy336
Posener, J. A., Wang, L., Price, J. L., Gado, M. H., Province, M. A., Miller, M. I., et al. (2003). High-dimensional mapping of the hippocampus in depression. Am. J. Psychiatry 160, 83–89. doi: 10.1176/appi.ajp.160.1.83
R Core Team (2019). R: A Language and Environment for Statistical Computing. R Found. Stat. Comput. Vienna: R Core Team.
Reuter, M., Rosas, H. D., and Fischl, B. (2010). Highly accurate inverse consistent registration: a robust approach. Neuroimage 53, 1181–1196. doi: 10.1016/j.neuroimage.2010.07.020
Reuter, M., Schmansky, N. J., Rosas, H. D., and Fischl, B. (2012). Within-subject template estimation for unbiased longitudinal image analysis. Neuroimage 61, 1402–1418. doi: 10.1016/j.neuroimage.2012.02.084
Samuels, B. A., Leonardo, E. D., and Hen, R. (2015). Hippocampal subfields and major depressive disorder. Biol. Psychiatry 77, 210–211. doi: 10.1016/j.biopsych.2014.11.007
Sartorius, A., Demirakca, T., Böhringer, A., von Hohenberg, C., Aksay, S. S., Bumb, J. M., et al. (2016). Electroconvulsive therapy increases temporal gray matter volume and cortical thickness. Eur. Neuropsychopharmacol. 26, 506–517. doi: 10.1016/j.euroneuro.2015.12.036
Schmaal, L., Veltman, D. J., Van Erp, T. G. M., Smann, P. G., Frodl, T., Jahanshad, N., et al. (2016). Subcortical brain alterations in major depressive disorder: findings from the ENIGMA major depressive disorder working group. Mol. Psychiatry 21, 806–812. doi: 10.1038/mp.2015.69
Scoville, W. B., and Milner, B. (1957). Loss of recent memory after bilateral hippocampal lesions. J. Neurol. Neurosurg. Psychiatry 20, 11–21. doi: 10.1136/jnnp.20.1.11
Ségonne, F., Dale, A. M., Busa, E., Glessner, M., Salat, D., Hahn, H. K., et al. (2004). A hybrid approach to the skull stripping problem in MRI. Neuroimage 22, 1060–1075. doi: 10.1016/j.neuroimage.2004.03.032
Seiger, R., Ganger, S., Kranz, G. S., Hahn, A., and Lanzenberger, R. (2018). Cortical thickness estimations of freesurfer and the CAT12 toolbox in patients with Alzheimer’s disease and healthy controls. J. Neuroimaging 28, 515–523. doi: 10.1111/jon.12521
Seiger, R., Hahn, A., Hummer, A., Kranz, G. S., Ganger, S., Woletz, M., et al. (2016). Subcortical gray matter changes in transgender subjects after long-term cross-sex hormone administration. Psychoneuroendocrinology 74, 371–379. doi: 10.1016/j.psyneuen.2016.09.028
Shaw, T., York, A., Ziaei, M., Barth, M., and Bollmann, S. (2020). Longitudinal automatic segmentation of hippocampal subfields (LASHiS) using multi-contrast MRI. Neuroimage 218:116798. doi: 10.1016/j.neuroimage.2020.116798
Small, S. A. (2014). Isolating pathogenic mechanisms embedded within the hippocampal circuit through regional vulnerability. Neuron 84, 32–39. doi: 10.1016/j.neuron.2014.08.030
Small, S. A., Schobel, S. A., Buxton, R. B., Witter, M. P., and Barnes, C. A. (2011). A pathophysiological framework of hippocampal dysfunction in ageing and disease. Nat. Rev. Neurosci. 12, 585–601. doi: 10.1038/nrn3085
Stepan, J., Dine, J., and Eder, M. (2015). Functional optical probing of the hippocampal trisynaptic circuit in vitro: network dynamics, filter properties, and polysynaptic induction of CA1 LTP. Front. Neurosci. 9:160. doi: 10.3389/fnins.2015.00160
Talairach, J., and Tournoux, P. (1988). Coplanar Stereotaxic Atlas of the Human Brain. New York, NY: Thieme Medical Publishers.
Toda, T., Parylak, S. L., Linker, S. B., and Gage, F. H. (2019). The role of adult hippocampal neurogenesis in brain health and disease. Mol. Psychiatry 24, 67–87. doi: 10.1038/s41380-018-0036-2
Tohka, J. (2014). Partial volume effect modeling for segmentation and tissue classification of brain magnetic resonance images: a review. World J. Radiol. 6:855. doi: 10.4329/wjr.v6.i11.855
van Eijk, L., Hansell, N. K., Strike, L. T., Couvy-Duchesne, B., de Zubicaray, G. I., Thompson, P. M., et al. (2020). Region-specific sex differences in the hippocampus. Neuroimage 215:116781. doi: 10.1016/j.neuroimage.2020.116781
Van Leemput, K. (2009). Encoding probabilistic brain atlases using bayesian inference. IEEE Trans. Med. Imaging 28, 822–837. doi: 10.1109/TMI.2008.2010434
Van Leemput, K., Bakkour, A., Benner, T., Wiggins, G., Wald, L. L., Augustinack, J., et al. (2009). Automated segmentation of hippocampal subfields from ultra-high resolution in vivo MRI. Hippocampus 19, 549–557. doi: 10.1002/hipo.20615
Videbech, P., and Ravnkilde, B. (2004). Hippocampal volume and depression: a meta-analysis of MRI studies. Am. J. Psychiatry 161, 1957–1966. doi: 10.1176/appi.ajp.161.11.1957
West, M. J., Coleman, P. D., Flood, D. G., and Troncoso, J. C. (1994). Differences in the pattern of hippocampal neuronal loss in normal ageing and Alzheimer’s disease. Lancet 344, 769–772.
Whelan, C. D., Hibar, D. P., Van Velzen, L. S., Zannas, A. S., Carrillo-Roa, T., McMahon, K. Z., et al. (2016). Heritability and reliability of automatically segmented human hippocampal formation subregions. Neuroimage 128, 125–137. doi: 10.1016/j.neuroimage.2015.12.039
Winterburn, J. L., Pruessner, J. C., Chavez, S., Schira, M. M., Lobaugh, N. J., Voineskos, A. N., et al. (2013). A novel in vivo atlas of human hippocampal subfields using high-resolution 3T magnetic resonance imaging. Neuroimage 74, 254–265. doi: 10.1016/j.neuroimage.2013.02.003
Wisse, L. E. M., Biessels, G. J., and Geerlings, M. I. (2014). A critical appraisal of the hippocampal subfield segmentation package in freesurfer. Front. Aging Neurosci. 6:261. doi: 10.3389/fnagi.2014.00261
Wisse, L. E. M., Chételat, G., Daugherty, A. M., de Flores, R., La Joie, R., Mueller, S. G., et al. (2020). Hippocampal subfield volumetry from structural isotropic 1 mm3 MRI scans: a note of caution. Hum. Brain Mapp 42, 539–550. doi: 10.1002/hbm.25234
Wisse, L. E. M., Kuijf, H. J., Honingh, A. M., Wang, H., Pluta, J. B., Das, S. R., et al. (2016). Automated hippocampal subfield segmentation at 7T MRI. Am. J. Neuroradiol. 37, 1050–1057. doi: 10.3174/ajnr.A4659
Worker, A., Dima, D., Combes, A., Crum, W. R., Streffer, J., Einstein, S., et al. (2018). Test-retest reliability and longitudinal analysis of automated hippocampal subregion volumes in healthy ageing and Alzheimer’s disease populations. Hum. Brain Mapp. 39, 1743–1754. doi: 10.1002/hbm.23948
Yushkevich, P. A., Pluta, J. B., Wang, H., Xie, L., Ding, S. L., Gertje, E. C., et al. (2015). Automated volumetry and regional thickness analysis of hippocampal subfields and medial temporal cortical structures in mild cognitive impairment. Hum. Brain Mapp. 36, 258–287. doi: 10.1002/hbm.22627
Keywords: hippocampus, subfields, FreeSurfer, MRI, reliability
Citation: Seiger R, Hammerle FP, Godbersen GM, Reed MB, Spurny-Dworak B, Handschuh P, Klöbl M, Unterholzner J, Gryglewski G, Vanicek T and Lanzenberger R (2021) Comparison and Reliability of Hippocampal Subfield Segmentations Within FreeSurfer Utilizing T1- and T2-Weighted Multispectral MRI Data. Front. Neurosci. 15:666000. doi: 10.3389/fnins.2021.666000
Received: 14 February 2021; Accepted: 28 May 2021;
Published: 08 September 2021.
Edited by:
Federico Giove, Centro Fermi - Museo Storico della Fisica e Centro Studi e Ricerche Enrico Fermi, ItalyReviewed by:
Alberto Redolfi, Centro San Giovanni di Dio Fatebenefratelli (IRCCS), ItalyDan Wu, Zhejiang University, China
Copyright © 2021 Seiger, Hammerle, Godbersen, Reed, Spurny-Dworak, Handschuh, Klöbl, Unterholzner, Gryglewski, Vanicek and Lanzenberger. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Rupert Lanzenberger, cnVwZXJ0LmxhbnplbmJlcmdlckBtZWR1bml3aWVuLmFjLmF0