'The ONS’s “contaminating” the baseline with data from post-covid years is indisputably inappropriate for this purpose. (One is tempted to add “baffling” or “incomprehensible” or “preposterous”, though readers may prefer more accusatory terms).'
By William Collins aka MRA-UK: This is a rapid response to the 20 February 2024 publication by the Office for National Statistics (ONS) of their new estimates of excess deaths in the UK based on a revised methodology. My intention is to run their model(s) independently but this will take a few days so I thought I’d commit some initial thoughts to a post immediately.
This link describes the ONS methodology; while this link is the corresponding dataset; and this link is their R-based code.
Giving public access to their code is unusual. Probably the ONS anticipate being deluged with hard questions from statistically competent people, and this is the best way of addressing such queries – essentially “see for yourself”. This is welcome and sensible. However, I shall use an independent code as any issues might lie within the code itself.
The key issue – which has not escaped public attention already – is that the new methodology estimates far smaller numbers of excess deaths in 2023 (though less so in 2020 and 2021, and actually rather more in 2022). My immediate thoughts as to how this comes about are as follows (bearing in mind that I have done no independent calculations as yet)…
There are two factors which I suspect contribute. Probably the most important is the change in age profile over the last 8 to 17 years. In my September-23 post on excess deaths I alluded to population increases being something that should be accounted for before concluding anything significant about an increase in the absolute numbers of deaths (per year, say). Accordingly I made an adjustment to the numbers of deaths of 2.5% to account for population increase over a six-year period (from mid 2015-2019 to mid-2023). I was aware that this figure may not be appropriate for the oldest age range in which most deaths occur. However, I failed to find adequate age-specific population data over the required period at that time.
The ONS methodology statement summaries UK population changes between 2006 and 2023 in their Figure 3. Over the same 6 year period this indicates a larger total population increase than I used, namely 4.3%. However, even more importantly, the increase in the population of those over 70 has been huge, some 14.1% over the same six-year period (and 35.4% since 2006). My initial guess is that this is the predominant cause of the changes in the estimates of excess deaths. If so, this component of the changed estimate is valid.
(I note that the ONS’s Figure 3 indicates an increased death rate in all post-covid years – but a decreased death rate after age-standardisation, implying that the increase in the proportion of those over 70 is indeed the cause of the increased death rate).
However, there is another factor on which my suspicions alight. In my September-23 post I was rather measured in my commentary about the ONS’s (previous) methodology. I merely said it was “inappropriate”. In their previous methodology, the ONS defined the expected number of deaths in 2022 as the average of deaths registered in years 2016, 2017, 2018, 2019 and 2021. For year 2023, they defined expected deaths as the average of years 2017, 2018, 2019, 2021 and 2022. If one wishes to examine the hypothesis that there has been an increase in excess deaths NOT directly attributable to covid-19 itself (as opposed to associated interventions), then the baseline must be taken prior to the period in which the hypothesised factor applies. The ONS’s “contaminating” the baseline with data from post-covid years is therefore indisputably inappropriate for this purpose. (One is tempted to add “baffling” or “incomprehensible” or “preposterous”, though readers may prefer more accusatory terms).
Unfortunately, the ONS’s revised methodology has compounded this problem: they now also include the peak covid year, 2020, in their baseline – albeit they omit months April, May, November and December due to the covid spikes. Their regression models include a variable “Trend” which may be the manner in which this effect is manifest within the model fits. Simply put, by including post-covid years in the baseline, if there is a genuine increase in excess deaths, the estimate of this excess would be minimised by including this increase in the baseline before the subtraction which yields the apparent excess death figure.
The criterion for exclusion of “peak covid periods” was when covid deaths were “given as the underlying cause of death for at least 15% of all deaths registered in the period across the UK”. This is rather arbitrary.
Because one of the factors I have identified, above, is valid but the other is invalid, it will be necessary to re-run the ONS models, using their dataset, to examine the relative size of the contributions from these two factors. In particular, to freeze the baseline at December 2019 (or possibly February 2020) and/or to omit the “Trend” variable, whilst also using death rates rather than absolute numbers of deaths in order to account for population changes. I will now proceed to do so.
Detailed Observations (for Statisticians)
The ONS regression models are fairly “plain vanilla” with linear terms and two quadratic interaction terms (age x sex and age x Trend). The log of the monthly, or weekly, numbers of deaths is regressed, but the inclusion of the log(population) in the model means that this is equivalent to regressing the log of the death rates, per age-sex-geography stratum.
This is a quasi-Poisson regression (variance not constrained to equal the mean). Poisson type regressions are often used with data which counts events (as here). It is not immediately obvious (to me, anyway) whether the regression of log(data) rather than the raw data will lead to a greater, or reduced, extrapolation (i.e., of expected deaths one year later). This needs to be explored.
95% confidence intervals (+/-1.96 sigma) are found from the regressions in the usual way. They argue that the actual number of deaths is a “known quantity and therefore has zero variance”. Hmm. First data I’ve ever seen with zero error. However, I don’t doubt that the error in the actual death data is negligible compared with the error in the model prediction of expected deaths – which is all that matters.
No comments:
Post a Comment