Neurology News

Revealing Individual Neuroanatomical Heterogeneity in Alzheimer Disease Using Neuroanatomical Normative Modeling


Background and Objectives Alzheimer disease (AD) is highly heterogeneous, with marked individual differences in clinical presentation and neurobiology. To explore this, we used neuroanatomical normative modeling to index regional patterns of variability in cortical thickness. We aimed to characterize individual differences and outliers in cortical thickness in patients with AD, people with mild cognitive impairment (MCI), and controls. Furthermore, we assessed the relationships between cortical thickness heterogeneity and cognitive function, β-amyloid, phosphorylated-tau, and ApoE genotype. Finally, we examined whether cortical thickness heterogeneity was predictive of conversion from MCI to AD.

Methods Cortical thickness measurements across 148 brain regions were obtained from T1-weighted MRI scans from 62 sites of the Alzheimer’s Disease Neuroimaging Initiative. AD was determined by clinical and neuropsychological examination with no comorbidities present. Participants with MCI had reported memory complaints, and controls were cognitively normal. A neuroanatomical normative model indexed cortical thickness distributions using a separate healthy reference data set (n = 33,072), which used hierarchical Bayesian regression to predict cortical thickness per region using age and sex, while adjusting for site noise. Z-scores per region were calculated, resulting in a Z-score brain map per participant. Regions with Z-scores <−1.96 were classified as outliers.

Results Patients with AD (n = 206) had a median of 12 outlier regions (out of a possible 148), with the highest proportion of outliers (47%) in the parahippocampal gyrus. For 62 regions, over 90% of these patients had cortical thicknesses within the normal range. Patients with AD had more outlier regions than people with MCI (n = 662) or controls (n = 159) (F(2, 1,022) = 95.39, p = 2.0 × 10−16). They were also more dissimilar to each other than people with MCI or controls (F(2, 1,024) = 209.42, p = 2.2 × 10−16). A greater number of outlier regions were associated with worse cognitive function, CSF protein concentrations, and an increased risk of converting from MCI to AD within 3 years (hazard ratio 1.028, 95% CI 1.016–1.039, p = 1.8 × 10−16).

Discussion Individualized normative maps of cortical thickness highlight the heterogeneous effect of AD on the brain. Regional outlier estimates have the potential to be a marker of disease and could be used to track an individual’s disease progression or treatment response in clinical trials.


Alzheimer disease;
Alzheimer’s Disease Neuroimaging Initiative;
false discovery rate;
interquartile range;
mild cognitive impairment;
Mini-Mental State Examination;
Open Access Series of Imaging Studies;
total outlier count;
University of California, San Francisco;
UK Biobank

Alzheimer disease (AD) is the commonest cause of dementia, being characterized by a progressive deterioration in cognitive functioning and independence.1 The AD spectrum comprises substantial clinical and biological differences between patients recognized in clinical and research criteria.2 These differences include variations in genetic basis,3 symptom profile, age at onset, trajectory and severity,4,5 biomarker readouts (e.g., CSF β-amyloid [Aβ] levels),6 comorbidities,7 and in atrophy patterns.8 Despite this, conventional statistical analyses focus on group averages. This fundamental statistical assumption posits that AD will affect different patients in similar ways,9 characterizing the average patient. To reach the goal of precision medicine for AD, we need to look beyond the average and design statistical approaches that reflect patient heterogeneity at the individual level.

Neuroimaging has revealed that differences in brain structure are very common in patients with AD.10 Neuroimaging methods are the gold standard of understanding the in vivo brain11; specifically, structural imaging has been described as the imaging workhorse of neurodegeneration, being commonly recommended in AD diagnostic guidelines.12 With this in mind, large structural neuroimaging data sets are increasingly available for dementia, such as Alzheimer’s Disease Neuroimaging Initiative (ADNI), Open Access Series of Imaging Studies (OASIS), and National Alzheimer’s Coordinating Center and in the general population (e.g., UK Biobank [UKB] and the Human Connectome Project). These data sets provide the ability to chart variation across cohorts and facilitate individual prediction.

Furthermore, large neuroimaging data sets have supported the development and application of data-driven methods in AD research. This has revealed that differences in brain structure are very common in patients.8,13 Moreover, they have enabled the estimation of disease subtypes from neuroimaging data, as a way to disentangle heterogeneity by grouping patients by distinctive neurobiological and cognitive characteristics8,10,13,14 and disease progression.15 Such subtypes have the potential to stratify patient groups for clinical decision making, such as regarding treatment strategy, services and therapies tailored to clinico-radiologic phenotype, and/or trial enrollment.16,17

Nevertheless, there are challenges associated with the clinical translation of neuroimaging-derived subtypes.10 These include the validity of subtypes, how distinct subtypes are from each other, and how stable subtypes are over the disease course.13,18 Moreover, by design, clustering assumes homogeneity within each cluster, clouding the individual-level variation present, therefore limiting the representation of heterogeneity in the sample.19 For instance, individual-level variation is seen in atypical, nonamnestic AD (who comprise up to a third of young-onset AD), which results in challenges to diagnosis and appropriate care.17 Arguably, assessing the neurobiology of AD at the individual patient level will provide a precise understanding of their disease, likely outcomes and facilitate tailored treatment strategies. However, although this concept of patient-centered, individualized precision medicine for AD is well established, current research efforts are limited.

Neuroanatomical normative modeling is an emerging technique that captures individual-level variability in the brain. This can provide individual statistical inferences with respect to an expected normative distribution or trajectory over time. Specifically, this was by modeling the relationship between neurobiological variables (e.g., neuroimaging features) and covariates (e.g., demographic variables such as age and sex) to map centiles of variation across a cohort (i.e., Z-scores). An individual can then be located within the normative distribution to establish to what extent they deviate from the expected pattern in each measure, and a map can be generated of where and to what extent an individual’s brain differs from the norm.20,21 This technique has shown to be suitable for precise mapping of individual patterns of variation in brain structure across multiple psychiatric and neurodevelopmental disorders.20,22,,24 Such findings motivate the first application of neuroanatomical normative modeling to AD.2

Here, we examine individual patterns of variation in brain structure in patients with AD using neuroanatomical normative modeling. Using the well-characterized, multisite, ADNI data set, we applied a recent implementation of the normative modeling framework, hierarchical Bayesian regression. This technique has been shown to efficiently accommodate intersite variation and provides computational scaling, which is useful when using large studies, or combining smaller studies together, that are acquired across multiple sites in a federated learning framework.25,,27 Our main objective was to quantify spatial patterns of neuroanatomical heterogeneity using cortical thickness measures in patients with AD, people with mild cognitive impairment (MCI), and cognitively normal controls by calculating deviations from normative ranges for each brain region and then identifying statistical outliers. Specifically, we aimed to (1) assess the extent of neuroanatomical variability between individual patients based on overlapping or distinct patterns of outliers, (2) quantify group differences in between-participant dissimilarity, (3) relate the quantity of neuroanatomical outliers to cognitive performance and AD biomarkers, and (4) examine whether the number of outliers relate to subsequent disease progression from MCI to AD.



Participants were derived from 2 data sets: (1) a reference data set that comprised healthy people across the human lifespan and (2) a clinical target data set, which included people with AD or MCI in addition to age-matched cognitively normal controls. The reference data set was made by combining data on healthy people from multiple publicly available sources,27 including OASIS, Adolescent Brain Cognitive Development study, and UKB, detailed in eTable 1 ( The clinical data used in the preparation of this article were obtained from the ADNI database.28 The criteria for study inclusion was the availability of a baseline T1-weighted MRI, which passed quality control. Furthermore, AD participants had to meet the National Institute of Neurological and Communicative Disorders and Stroke-AD and Related Disorders Association criteria for probable AD and were screened to exclude genetic risk for familial AD. Participants with MCI reported a subjective memory concern either autonomously or via an informant or clinician, and participants had no significant levels of impairment in other cognitive domains.

Standard Protocol Approvals, Registrations, and Patient Consents

Written informed consent was obtained from all participants before experimental procedures were performed. Approval was received by an ethical standards committee for ADNI study data use.

MRI Acquisition

For the clinical data set, T1-weighted images were acquired at multiple sites using 3T MRI scanners. Detailed MRI protocols for T1-weighted sequences are available online.29 The quality of raw scans was evaluated by University of California, San Francisco (UCSF) before our exclusion criteria. Scans were excluded based on technical problems and significant motion artifacts and clinical abnormalities.30

Estimation of Cortical Thickness

T1-weighted scans from both the reference and ADNI data sets were processed using a mix of both FreeSurfer versions 5 and 6. Cortical thickness values were generated using the recon-all cross-sectional approach.31 This cortical thickness algorithm calculates the mean distance between vertices of a corrected, triangulated estimated gray/white matter surface and gray matter/CSF (pial) surface,32 which generated the cortical thickness of each region of the Destrieux atlas regions.33 This included the mean cortical thickness and 148 regions cortical thickness values for each participant.

Quality control of FreeSurfer processing for the reference data set relied on automated filtering median-centered absolute Euler number higher than 25, as used in prior work.26,27 The exclusion of outliers based on Euler numbers has shown to be a reliable quality control strategy in large neuroimaging cohorts.34,35 For the ADNI, quality control was based on a visual review of each cortical region performed by UCSF. Only scans that passed this quality control were used.

Neuroanatomical Normative Modeling

A hierarchical Bayesian regression model was trained on multisite data to generate normative models per region using the covariates age and sex. This was based on the population variation in the reference data set (training data), which adaptively pools parameter estimates across sites via a shared prior over regression parameters across sites.27 This simultaneously accounts for intersite variation and allows sites to borrow strength from one another in a fully Bayesian framework. The advantage of training the models on the large independent data set, compared with just using the ADNI, is that the ADNI consists of many sites with small sample sizes. This would result in unstable estimates of normative distributions that could be strongly influenced by outliers or sampling bias. Here, by training on over n = 33,000 from only 9 data sets (with 60 sites), the model produces a stable distribution of estimates across the entire lifespan. Next, these estimates were conditioned to our specific context, using an adapted transfer learning approach.27 The parameters of the reference normative model were recalibrated to the ADNI data set using 70% of healthy controls per ADNI site, where 70% was used to give stable estimates of the transferred model parameters, given that many of the scan sites in the ADNI have quite small sample sizes. The remaining 30% of healthy controls plus MCI and patients with AD were used to assess the heterogeneity in neuroanatomical presentation. This process generated regional and mean cortical thickness Z-scores for each participant in the clinical data set, relative to the normative range of the reference data set. All modeling steps are performed using PCNtoolkit (version 0.20).

Statistical Analysis

Group Cortical Thickness Differences

Cortical thickness group comparisons were conducted using t tests at each region and corrected for multiple comparisons using the false discovery rate (FDR). Significant p values were mapped onto the Destrieux atlas using the R package ggseg.36

Outlier Definition and Statistics

Outliers in terms of low cortical thickness were identified for each region, defined as Z <−1.96 (corresponding to the bottom 2.5% of the normative distribution of cortical thickness). We only used the lower bound threshold for outliers as we were interested in cortical thinning associated with neurodegeneration. The number of outliers was summed across 148 regions for each participant to give a total outlier count (tOC) across regions. Linear regression tested for group differences in mean cortical thickness Z-score and tOC. In addition, group comparisons at each region were conducted using χ2 (FDR corrected). The Hamming distance, a quantitative measure of similarity between binary thresholded cortical thickness outlier vectors, was used to measure dissimilarity between individuals. Median Hamming distances were compared between groups. To explore spatial patterns of cortical thickness outliers per group, the proportion of participants within each group whose cortical thickness was an outlier (i.e., Z < −1.96) was calculated for each region. This enabled visualization of the extent to which patterns of outlier regions overlap or are distinct. This was mapped using the Destrieux atlas via the R package ggseg. All statistical analyses were implemented in R version 3.6.2.

Outlier Associations With Cognitive Function and CSF Markers

Linear regression adjusting for age, sex, years of education, and Clinical Dementia Rating (sum of boxes) examined the relationship between tOC and cognitive composite scores (memory using ADNI MEM or executive function using ADNI EF).37 We assessed the interactional effects of the diagnostic group within a subsequent regression. Furthermore, linear regression adjusting for age and sex only examined the relationship between tOC and CSF markers (Aβ and phosphoylated-tau [p-tau]). Here, we also assessed the interactional effects of the diagnostic group within a subsequent regression. To stratify outlier maps in both MCI and patients with AD groups, we used total scores from the Mini-Mental State Examination (MMSE).

MCI to AD Conversion Analysis

Follow-up diagnosis status data, up to 3 years from the baseline scan, were obtained from 454 people with MCI. In total, 76 people with MCI at baseline had converted to AD within 3 years. We then ran a survival analysis using Cox proportional hazards regression to assess whether tOC related to the risk of converting from MCI to AD, controlling for age and sex. We use a Kaplan-Meier plot to illustrate how either a low or high tOC (split via median) can contribute to the risk of converting.



In the reference data set, a total of n = 33,072 T1-weighted MRI scans were collated across 60 sites (this sample is described in detail in Kia et al.27 and summarized in eTable 1, The clinical ADNI data set amounted to 1,492 participants which were scanned across 62 sites (Table 1). Here 70% of controls were removed from the clinical data set and were used as a calibration data set to adapt the normative model to the new sites. These controls were randomly selected and stratified across sites and gender to make sure all sites and genders are present in the adaptation set. This left a total of 1,027 participants in the final clinical data set.

Table 1

Demographics of the ADNI Sample

Patients With AD Have Smaller Cortical Thicknesses Than People With MCI or With Normal Cognition

Mean cortical thicknesses were compared across participant groups. Age- and sex-adjusted mean cortical thickness significantly differed between groups overall (F(2, 1,487) = 137.9, p = 2.0 × 10−16). Pairwise comparisons (Tukey post hoc) were all significant (p < 0.001), with mean cortical thickness being lowest in AD (mean 2.28, SD 0.13, 95% CI 0.161 to −0.124) and highest in controls (mean 2.42, SD 0.11, 95% CI 2.415–2.433), with MCI being intermediate (mean 2.38, SD 0.12, 95% CI −0.054 to −0.029) (eFigure 1, Region-level pairwise group comparisons (total of 148 regions—FDR corrected) provided evidence cortical thickness measures were on average lower in 133 regions in AD vs controls, in 111 regions in AD vs MCI and in 78 regions in MCI vs controls (eFigure 1,

Next, cortical thickness Z-scores, derived from comparison to the normative model, were then compared across participant groups. In this way, we could compare the degree to which each group differed from the separate reference cohort, used to define the normative model. Consistent with comparisons of mean cortical thickness, age- and sex-adjusted Z-scores differed between groups overall (F(2, 1,022) = 69.49, p = 2.0 × 10−16). Pairwise comparisons (Tukey post hoc) were all significant (p ≤ 0.003), with Z-scores being lowest in AD (mean −1.27, SD 1.41, 95% CI −1.630 to −1.130), highest in controls (mean 0.07, SD 1.04, 95% CI −1.053 to 0.374), and intermediate in MCI (mean −0.28, SD 1.17, 95% CI −0.600 to −0.180) (eFigure 2A,

Furthermore, age- and sex-adjusted tOCs differed between groups overall (F(2, 1,022) = 95.39, p = 2.0 × 10−16). Pairwise comparisons (Tukey post hoc) were all significant (p ≤ 0.003), with tOCs being highest in AD (median 12, interquartile range [IQR] 28, 95% CI 14.38–19.88), lowest in controls (median 2, IQR 6, 95% CI 2.780–18.494), and intermediate in MCI (median 4, IQR 9, 95% CI 1.56–6.18) (eFigure 2B,

Region-level pairwise group comparisons (total of 148 regions—FDR corrected) showed higher numbers of outliers in cortical thickness in 79 regions in AD vs controls, in 63 regions in AD vs MCI, and 1 region in MCI vs controls. Region-level group differences in outlier count were most evident within temporoparietal and to a lesser extent frontal and occipital regions (Figure 1A).

Figure 1
Figure 1 Regional Maps of Heterogeneity

(A) Mapped are significant group differences of outliers. The color bar indicates effect size as Phi φ (0.1 is considered to be a small effect, 0.3 a medium effect, and 0.5 a large effect). (B) Mapped is the percentage of outliers present within each participant group. The color bar reflects outlier proportion from 2.5% to 100% (thresholding of z-scores). Gray represents that no participants have outliers in those respective regions. AD = Alzheimer disease; MCI = mild cognitive impairment.

Patients With AD Are Less Similar to Each Other Than People With MCI or With Normal Cognition

Hamming distance matrices indicated greater within-group dissimilarity in patients with AD, relative to MCI or control participants, who were most similar to each other in spatial patterns of outliers (Figure 2). The median hamming distance significantly differed between groups overall (F(2, 1,024) = 209.42, p = 2.2 × 10−16). Pairwise comparisons (Tukey post hoc) were all significant (p < 0.001), with being highest in AD (median 32, IQR 32, 95% CI 26.29–29.43) and lowest in controls (median 6, IQR 8, 95% CI −24.37 to −19.61), with MCI being intermediate (median 10, IQR 14, 95% CI −18.52 to −14.92).

Figure 2
Figure 2 Outlier Dissimilarity

(A) Outlier distance heatmaps: both x and y axes represent participants within each group. Yellow indicates higher hamming distance (greater dissimilarity between participants in this brain region), as opposed to if participants are identical in this brain region, the Hamming distance would be 0, represented by white in the color bar. (B) Outlier distance density: illustrates the spread of outlier dissimilarity (calculated by the Hamming distance) within each group. AD = Alzheimer disease; MCI = mild cognitive impairment.

Patients With AD Have Spatially Higher Proportions of Cortical Thickness Outliers

The proportion of outliers defined within each group differed in regional patterns between AD, MCI, and control groups. This is illustrated in Figure 1B and in eFigure 3 ( For a breakdown of proportions, see eTable 2; for individual maps of outliers, see. A greater number of regions and a higher proportion of the group were outliers in patients with AD, as expected. In fact, 145 regions in the AD group had over the expected 2.5% of patients with an outlier (based on the Z < −1.96 threshold). The left parahippocampal gyrus was the region with the highest outlier percentage (47% of the AD group). For the MCI group, 138 regions in the MCI group had outliers (over the expected 2.5% of the group). The left parahippocampal gyrus was the region with the highest outlier percentage (14% of the MCI group). For the control group, only 66 regions had outliers above the expected 2.5%. The left occipital temporal lateral sulcus was the region with the highest outlier percentage (6% of controls).Videos 1

Outliers Are Associated With Cognitive Function and CSF Aβ and p-Tau

tOC across the whole sample was significantly associated with memory performance (β = −0.01, p = 2.2 × 10−16) and executive function (β = −0.02, p = 2.2 × 10−16) in a linear regression model. To check for the association between 2 variables within a sample, we also model a group by tOC interaction term, which was not significant for memory performance (F(2, 849) = 2.28, p = 0.103) and executive function (F(2, 849) = 2.534, p = 0.07) (Figure 3, A and B). Lower MMSE scores showed different spatial patterns of outliers in both MCI and patients with AD (Figure 4A) groups. However, total MMSE score and age did explain some of the variance in tOC (adjusted R2 = 0.1793, p = 2.2 × 10−16). In addition, tOC was significantly associated with Aβ (β = 0.002, p = 0.022) and p-tau (β = 0.1301, p = 1.04 × 10−8), which was not influenced by either group Aβ (F(2, 576) = 0.96, p = 0.38) or p-tau interaction (F(2, 576) = 1.362, p = 0.257) (Figure 3, C and D).

Figure 3
Figure 3 Cognitive Function and CSF Marker Association With tOC

Fitted lines are from a linear regression model per diagnostics group for (A) memory function, (B) executive function, (C) CSF β-amyloid, and (D) phosphoylated-tau. AD = Alzheimer disease; MCI = mild cognitive impairment; tOC = total outlier count.

Figure 4
Figure 4 Regional Maps of Outliers, Stratified by the MMSE Score

Mapped is the percentage of region outliers proportional to the MMSE scoring subgroup in (A) participants with MCI and (B) patients with AD. The color bar reflects outlier proportion from 2.5% to 100% (thresholding of z-scores). Gray represents that no participants have outliers in those respective regions. AD = Alzheimer disease; MCI = mild cognitive impairment; MMSE = Mini-Mental State Examination.

Case Studies Suggest That Variability in Cortical Thickness Is Not Solely Due to Disease Stage or Other Clinical Factors

To explore whether individual differences in outlier maps were driven by disease-related characteristics (such as ApoE genotype and demographics) or by disease stage, we examined sets of participants closely matched for ApoE genotype status, age, sex, and MMSE score. Figure 4B presents 4 individual female patients with AD all aged 71–72 years, heterozygous for ApoE ε4, with similar MMSE scores, all of whom were CSF amyloid positive, with no underlying comorbidities. Furthermore, clinical impressions confirm that these individuals all have mild dementia, with further confirmation of no depressive symptoms. These individual patients might be considered similar from biological or clinical perspectives, yet their patterns of outliers in cortical thickness are markedly variable; for example, variously suggesting lateralized (patient 3) and occipital atrophy (patient 1).

Greater Numbers of Outliers Are Associated With Risk of Conversion From Mild Cognitive Impairment to AD

A survival analysis indicated that for every 10 points of tOC, the risk of converting from MCI to AD within 3 years increased by 31.4% (hazard ratio 1.028, 95% CI 1.016–1.039, p = 1.8 × 10−16) (Figure 5A). This is illustrated within a Kaplan-Meier plot, which shows how a high tOC can contribute to the risk of converting in comparison to a low tOC (Figure 5B).

Figure 5
Figure 5 Conversion From Mild Cognitive Impairment to AD

(A) Kaplan-Meier plot of MCI to AD conversion: the 2 lines represent a median split of tOC, with <4 classed as low tOC (blue) and ≥4 classed as high tOC (red). Crosses indicate censoring points (i.e., age at last diagnosis assessment). The filled color represents the 95% confidence intervals. (B) Mapped is the proportion of regional outliers among people with MCI who converted to patients with AD. AD = Alzheimer disease; MCI = mild cognitive impairment; tOC = total outlier count.


In this study, we defined individual spatial patterns of cortical thickness outliers and illustrated that AD does not affect different people in a uniform way. Moreover, our analysis quantified and visualized these individual differences in patterns of cortical atrophy. Overall, the results of the present study provide evidence of (1) heterogeneous patterns of cortical thickness between patients with AD, (2) associations of cortical thickness heterogeneity with cognitive performance and CSF Aβ and p-tau, and (3) the potential of individualized markers of cortical thickness heterogeneity to predict survival time before conversion from the MCI stage to diagnosed AD.

Our findings both complement and offer additional information to the established understanding of AD. We observed a high tOC in patients with AD, consistent with the evidence of cortical thinning as a consequence of AD neuropathology.38 Moreover, we also observe significant associations with cortical thinning and poor cognitive performance, a decrease in CSF Aβ, and an increase in CSF p-tau (Figure 3), which is also consistent with previous findings.39,40 Atrophy has also been associated with the risk of progression from MCI to AD41 (Figure 5), alongside a combination of other biomarkers.42 Importantly, these previous studies examined the correlates of common patterns of cortical atrophy—yet conversely, we considered individual variability in patterns of cortical thickness, as opposed to assessing group average relationships. This highlights that individualized measures of neuroanatomy are sensitive to both nonimaging disease markers and disease progression.

The tOC has the potential to be used as an individual patient metric of poor brain health to help inform clinical decisions. Indeed, similar measures have recently been adopted as a clinical measure, that is, brain volume/thickness patient Z-scores. However, these have been calculated using different normative modeling techniques,43,44 which base their normative population on smaller reference samples; limit modeling to just whole brain, or within specific regions; and do not account for site-related variation (i.e., site effects). These studies also did not fully relate these to clinical outcomes and cognitive scores. Our tOC can provide an optimized measure here and will translate within clinical applications for precision medicine.

When assessing regional heterogeneity of the ADNI sample, we observed more outliers in patients with AD in temporal regions such as the hippocampus and the cingulate cortex. These are areas known to be sensitive to neurodegeneration in AD45 and are responsible for clinical symptoms in AD.46 However, looking beyond these group-average regional differences, we observe that the highest proportion of outliers in a single region was less than 50% in the AD group (Figure 1). This suggests that the individual spatial patterns of outliers in AD only partially overlap between patients; if atrophy were homogenous (as assumed within group averages), we might expect 100% of participants to have outliers here.

The observed variation in atrophy in the temporal lobe is consistent with subtyping studies.8,47 Also, a recent study used normative modeling to estimate neuroanatomical heterogeneity within the ADNI cohort, which shows similarities of variation in atrophy within the temporal regions.48 However, in comparison to these studies, our specific application of neuroanatomical normative modeling has enabled the creation of an individual metric of neuroanatomical heterogeneity, characterized the spatially distributed nature of alterations in MCI and AD, and assessed how neuroanatomical variability relates to cognitive performance, CSF biomarkers, disease progression, or genetic factors. Furthermore, our study employs a normative modeling technique (hierarchical Bayesian regression), which crucially accounts for the confounding effects of multiple scanning sites when evaluating neuroanatomical heterogeneity in AD.

Going further, our study reveals that each patient not only differs in the number of outliers they have, but the regional patterns of outliers markedly differ. The latter is reflected in large levels of dissimilarity between individuals with AD (Figure 2). Potentially, one reason for the variable patterns of atrophy is simply disease stage, whereby more atrophy appears with greater disease progression. However, our results indicate that this is not the case, as when closely examining patients of very similar demographics and clinical characteristics, being at a comparable disease stage (e.g., based on MMSE score), heterogeneous patterns of cortical atrophy were still present (Figure 4).

It is surprising to observe that cognitively normal controls also showed some outliers, suggesting a degree of within-group heterogeneity (Figures 1 and 2). Therefore, the assumption of homogeneity in case-control studies should be made with caution, even in control groups. Statistical designs for basic research and clinical trials should better reflect this heterogeneity in brain structure.

A few considerations can be made regarding the data sets used within the study. Although the reference data set includes over 30,000 individuals, we should be cautious to assume that it is representative of a healthy population. Also, patients who volunteer for research studies (i.e., ADNI) do not necessarily reflect the clinical population. Future neuroanatomical normative modeling studies could supplement the reference data set with MRI scans acquired from routine clinical visits, community cohorts, or other less selective sources. Finally, the reference data were processed with a variety of FreeSurfer versions. While impractical to unify the image processing retrospectively, the different versions of FreeSurfer may potentially add noise to the normative models. This represents an important caveat to consider and further investigate.

As the ADNI comprises more participants with early-stage dementia, examining late-stage patients with AD may offer insights into the heterogeneity in spatial patterns of atrophy across the disease course. Clinical observations have suggested that late-stage patients with AD have widespread atrophy across the brain; therefore, we may hypothesize such patients will have less heterogeneous patterns of atrophy. However, regardless of the heterogeneous patterns of atrophy, the tOC can still provide information about the extent of cortical atrophy in a given individual.

Another limitation of the ADNI data set is the underrepresentation of cognitive domains beyond memory, executive function, and language. Between a quarter to a third of the AD group exhibit parieto-occipital outliers, comparable to separate parieto-occipital predominant subtypes associated with prominent visuospatial dysfunction,10 Further characterization of how outlier distribution relates to nonmemory/executive symptoms may be of particular clinical relevance, for example, given the implications of visuospatial dysfunction for diminished autonomy, falls risk, and appropriate services.17,49,50

Future efforts when applying neuroanatomical normative modeling to AD data should incorporate serial neuroimaging across multiple time points. This will help define patient-level longitudinal trajectories. Mapping neuroanatomical variability using neuroanatomical normative modeling at different time points has the potential to improve predictions of disease progression or treatment response at the level of the individual patient. Apart from our MCI to AD analysis, the sample taken from this study is cross-sectional, reflecting a snapshot in time, yet heterogeneity has been shown to differ temporally.51 Potentially, data-driven staging methods here (e.g., SusStain15) may also provide clinically useful information of longitudinal trends of individual heterogeneity while taking account of an individual’s disease stage.

Furthermore, it will also be valuable to map variation using other neuroanatomical metrics, such as subcortical volumes. Our methodology can be extended to include subcortical volumes by using a reference data set that has such data available.23 Future efforts that adopt this could enrich our understanding of regional anatomic heterogeneity between patients.

We provide a quantitative approach to estimate variability in brain atrophy at the regional level for individual patients. Individualized maps of neuroanatomical outliers were related to cognitive performance and CSF biomarkers. Furthermore, the number of outliers, based on individual patterns, helped predict conversion from MCI to AD. These individual neuroanatomical maps, derived from normative models, have the potential to be a marker of AD state. These could index disease progression or even evaluate the effectiveness of potential disease-modifying treatments tailored to the individual patient.

Study Funding

This work was supported by the EPSRC-funded UCL Centre for Doctoral Training in Intelligent, Integrated Imaging in Healthcare (i4health) (EP/S021930/1) and the Department of Health’s National Institute for Health Research funded University College London Hospitals Biomedical Research Centre. In addition, A.F. Marquand gratefully acknowledges funding from the Dutch Organization for Scientific Research via a VIDI fellowship (grant number 016.156.415). J.M. Schott acknowledges the support of Alzheimer’s Research UK, Brain Research UK, Weston Brain Institute, Medical Research Council, and the British Heart Foundation. K.X.X. Yong is an Etherington PCA Senior Research Fellow and is funded by the Alzheimer’s Society (grant number 453 AS-JF-18-003).


The authors report no relevant disclosures. Go to for full disclosures.


Data collection and sharing for this project was funded by the ADNI (NIH grant U01 AG024904) and DOD ADNI (Department of Defence award number W81XWH-12-2-0012). The ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie; Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC; Johnson & Johnson Pharmaceutical Research & Development LLC; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health ( The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.

Appendix Authors



  • Go to for full disclosures. Funding information and disclosures deemed relevant by the authors, if any, are provided at the end of the article.

  • The Article Processing Charge was funded by METAFORA Biosystems.

  • Previously published at medRxiv doi: 10.1101/2022.06.30.22277053.

  • Submitted and externally peer reviewed. The handling editors were Associate Editors Bradford Worrall, MD, MSc, FAAN, and Andrea Schneider, MD, PhD.

  • Editorial, page 1125

  • Received September 1, 2022.
  • Accepted in final form March 2, 2023.