'Both the current and new methods used by the ONS fail to address the
hypothesis that there might be excess deaths post-covid, not
attributable directly to covid, due to the use of an inappropriate
baseline.'
By William Collins aka MRA-UK: This post should be read in conjunction with Part 1.
I have now carried out some initial runs of the ONS model. There is much I still wish to explore but I thought I’d publish this Part 2 with all possible speed as it contains the essential refutation of the ONS position.
I have not used the ONS code; I have not have I used R at all. I have used Python with statsmodels.api OLS functionality.
However, I have used the same dataset as ONS. As explained in Part 1, this offers an improvement on what I have done previously (e.g., here) because it contains improved data on the variation of the UK age profile over the last 19 years. (At least, I am assuming it’s “improved”).
I use the same regression model as ONS, i.e., based on the log of the dependent variable (quasi-Poisson) and the same dependent variables. However, I have restricted my re-analysis so far to the case of monthly data (not weekly) and have fitted the model to all-UK. This contrasts with ONS where fits were conducted for the four UK nations separately before summing to get the UK total of excess deaths. Hence the model I have used has 111 independent variables.
The ONS use a baseline for the fitting of the model which varies according to the month in which the excess deaths are to be estimated, and with a 12 month lag. For example, to estimate the excess deaths in January 2023 they fitted the model to data from February 2017 to January 2022 (inclusive), whereas for February 2023 they fitted the model to data from March 2017 to February 2022 (inclusive), etc. This requires a great deal of refitting the model (and that would be even more the case if using weekly data).
I have adopted a shortcut for expediency and speed of response. For the whole of 2023 I have fitted the model to the middle of the range that was used by ONS, namely July 2017 to June 2022 (inclusive).
(NB: the peak covid months of April, May, November, December 2020 and January, February 2021 are excluded from all baselines, both ONS and my own).
From Part 1 readers will appreciate that my concern over the ONS analysis relates to the inclusion of the covid and post-covid years in the baseline – as the above range of fitted dates illustrates. To reiterate the key issue: If one wishes to examine the hypothesis that there has been an increase in excess deaths NOT directly attributable to covid-19 itself (as opposed from associated interventions), then the baseline MUST be taken prior to the period in which the hypothesised factor applies. The ONS “contaminate” the baseline with data from post-covid years and so this is indisputably inappropriate for the purpose of examining said hypothesis. If instead the ONS approach to the baseline is adopted, any signal in the data that would align with the hypothesis is fully or partially “subtracted out” when the difference is calculated between actual deaths and predicted (i.e., “expected”) deaths.
Consequently, I have also deployed the same regression model but fitted it to the five years of data 2015 to 2019, i.e., pre-covid, as the appropriate means of predicting the expected deaths in years 2020 and thereafter.
Actually, with the use of a regression model including time dependent terms (both age profile and the “Trend” variable) the implicit rationale for using only a five year period as the baseline has ceased to apply. A reasonably short baseline was necessary when a simple average death rate over those years was used, because in that case the baseline should not be too strongly influenced by long-term trends. However, since trends are included in the regression model it is equally (or perhaps more) appropriate to use a baseline from a fit to all the pre-2020 data, i.e., from 2005 to 2019 in the available dataset. I have therefore also looked at this definition of the baseline.
For speed of response I am reporting now on excess deaths calculated for (the whole of) 2023 only. Clearly I need to look also at 2020, 2021 and 2022. I will do so shortly. There are also a number of variations on the model I wish to explore, e.g., fitting directly to the dependent variable (death rates) rather than logarithmic quantities, and a number of other things. These will follow ultimately in Part 3.
Excess Deaths Estimated for 2023
ONS “current” method: 31,442
ONS new method: 10,994
My approximate re-run of ONS new method: 10,210
Using the 2015 – 2019 baseline and the ONS model: 41,291
Using the 2005 – 2019 baseline and the ONS model: 50,528
Note that these are totals and so include deaths directly attributable to covid.
Conclusions
Both the current and new methods used by the ONS fail to address the hypothesis that there might be excess deaths post-covid, not attributable directly to covid, due to the use of an inappropriate baseline.
Inclusion of covid and post-covid years in the ONS baseline (potentially) contaminates the baseline with any post-covid factors via the “Trend” variable (which occurs in the model in both linear and interaction terms).
The ONS model itself, together with the associated dataset, is potentially the basis of an improved method for estimating excess deaths. In particular, the inclusion of time-dependent age profiles is important.
Total excess deaths in 2023, including potential post-covid factors, have been estimated using the ONS model and baselines terminating on 31 December 2019, resulting in excess deaths of about 41,000 to 50,000.
This contrasts with the ONS “new method”, which effectively excludes, or minimises, post-covid factors, which indicates total excess deaths in 2023 of about 11,000 (a result which has been approximately confirmed here).
No comments:
Post a Comment