Abstract
COVID-19 raised tension both within China and internationally. Here, we used mathematical modeling to predict the trend of patient diagnosis outside China in future, with the aim of easing anxiety regarding the emergent situation. According to all diagnosis number from WHO website and combining with the transmission mode of infectious diseases, the mathematical model was fitted to predict future trend of outbreak. Daily diagnosis numbers from countries outside China were downloaded from WHO situation reports. The data used for this analysis were collected from January 21, 2020 and currently end at February 28, 2020. A simple regression model was developed based on these numbers, as follows: , where is the total diagnosed patient till the i-th day and t=1 at February 1, 2020. Based on this model, we estimate that there were approximately 34 undetected founder patients at the beginning of the spread of COVID-19 outside China. The global trend was approximately exponential, with an increase rate of 10-fold every 19 days. Through establishment of this model, we call for worldwide strong public health actions, with reference to the experiences learned from China and Singapore.
Significance of this study
What is already known about this subject?
A novel coronavirus was verified and identified as the seventh member of the enveloped RNA coronaviruses as the cause of the disease, which is referred to as COVID-19.
COVID-19 raised tension both within China and internationally.
Considering the complexity of the real-life situation, a simple model is expected to be more accurate for describing the spread of the virus.
What are the new findings?
A “log-plus” model has been established to predict the situation, which only requires daily number of total diagnoses outside China.
There have been about 34 unobserved founder patients of COVID-19 at the beginning of spread outside China.
The global trend is approximately exponential, at a rate of 10-fold every 19 days.
How might these results change the focus of research or clinical practice?
As a 10-fold increase in patient numbers of COVID-19 every 19 days has been estimated, we call for strong public health actions worldwide.
Introduction
In early December of 2019, pneumonia cases of unknown cause emerged in Wuhan, the capital of Hubei province, China.1 A novel coronavirus (now named SARS-CoV-2) was verified and identified as the seventh member of the enveloped RNA coronaviruses (subgenus, Sarbecovirus; subfamily, Orthocoronavirinae) using high-throughput sequencing2–4 as the cause of the disease, which is referred to as COVID-19. Based on the evidence from early transmission dynamics, human-to-human transmission in hospital and family settings had been accumulating5–7 and occurred among close contacts since the middle of December 2019.8 According to WHO statistics, the accumulated number of diagnosed patients in China on August 08, 2020 was 89,057.9
COVID-19 raised tension both within China and internationally. Since the first case of COVID-19 pneumonia was reported from Wuhan, COVID-19 was rapidly diagnosed in patients in other Chinese cities and in neighboring countries, including Thailand, South Korea, Japan, and even a few Western countries.10–12 On January 13, 2020, the Ministry of Public Health of Thailand reported the first imported case of laboratory-confirmed novel coronavirus (COVID-19).13 After that, surges in cases of COVID-19 in Italy, Japan, and Iran also heightened fears that the world is on the brink of a pandemic. Therefore, on February 28, the WHO increased the assessment of the risk of spread and impact of COVID-19 to very high at the global level. Approximately 19 187 943 reported cases and 716 075 deaths of COVID-2019 have been reported to date August 08, 2020.9 The USA, Brazil, and India are currently the three most affected countries.14
Recently, considerable research resource has been devoted to conducting detailed analysis of the spread of the COVID-19 epidemic.15 16 Several parallel studies have reported that the estimated reproductive number (R0) of COVID-19 is higher than that of SARS, based on different models.17–19 Considering the superspreaders (P), hospitalized (H), and fatality class (F), an ad hoc compartmental mathematical model of the COVID-19 has been established to describe the reality of the Wuhan outbreak and predict the daily number of the confirmed cases.20 Several studies used deep learning to forecast COVID-19 infections.21 22 The disease transmission model predicted the gravity of COVID-19 in Canada using the long short-term memory (LSTM) networks.23 Data-driven estimation methods like LSTM and curve fitting were also used to evaluate the number of COVID-19 cases in India for the next 30 days and the effect of preventive measures.24
Given the limited number of data points and the complexity of the real-life situation, a simple model is expected to be more accurate for describing the spread of the virus (see Discussion section). In this study, we propose a “log-plus” model to predict the situation, which only requires daily number of total diagnoses outside China. This model assumes that there were some unobserved founder patients at the beginning of viral spread outside China and subsequent exponential growth. Despite the simplicity of our model, it fits the data well (R2=0.991). This prediction has potential practical and socially applicable significance and provides evidence that can enhance public health interventions to avoid severe outbreaks.
Methods
Data
Daily numbers of COVID-19 diagnoses in countries outside China were downloaded from WHO situation reports (https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports). The data used in this analysis start on January 21, 2020 and end at February 28, 2020.
Model
Data were first explored by plotting log-transformed daily case numbers. A linear trend was observed in more recent data, while the fit was relatively poor for earlier time points. The presence of some undetected founder patients at the early time points were considered. Based on exploratory analysis and mathematical intuition, we proposed the following model:
where Nt is the number of patients diagnosed outside China, according to WHO, on the t-th day, t=1 on February 1; u is the number of unobserved founder patients at the beginning of spread outside China; and a and b are simple linear regression parameters. We enumerated u from 0 to 100, with step size 1. For each u, we calculated Pearson’s correlation (R2) between t and , and selected the that maximized R2 and estimated corresponding and , using a simple linear regression between t and .
Availability of source code
The source code of the model is available at: https://github.com/wangyi-fudan/COVID-19_Global_Model
Results
Data table
The WHO daily count of numbers of diagnoses outside China and ‘log-plus’ transformed data, as well as model fit data, are presented in table 1.
Parameter estimation
According to February 28 data, , , and were estimated as 34, 0.0515, and 2.075, respectively (figure 1).
Global trend model
Next, we plotted against time to visualize model fitting (figure 2). The R2 value for the model was 0.991, indicating an excellent fit.
Future number prediction
The number of COVID-19 diagnoses as of February 28 was 4691. Our model predicts that the number of diagnoses outside China will expand exponentially at a rate of 10-fold every 19 days in the absence of strong public health interventions.
Discussion
In this report, only the total number of diagnoses outside China was analyzed. Country-scale data are also available, but is less complete than the total numbers; hence, we limited our analysis to capture the global trend.
This model is a minimal extension of the “default” exponential growth model, using an estimate of 34 undetected founder patients outside China. An almost perfect model fit (R2=0.991) indicates that the spread of disease does follow our model.
A simple and straightforward linear model has some advantages: (1) it works for small sample sizes, due to limited observation or somewhat imperfect data; (2) it is relatively robust in complex situations, and the virus spreading pattern is complex and varies across the world, hence a simple model can provide coarse-grained trend estimation; and (3) a linear model easier to extrapolate than more complex models (eg, neural networks).
The existence of 34 undetected founder patients is not surprising. Actually, founder patients are those patients who are not reported at the beginning (January 22) of WHO reports. Thus, most of them are not under control and continually contribute to the pandemic. These individuals may have had mild symptoms and thus did not attend hospital; however, we do not preclude that they were already present before, or parallel with, the outbreak in Wuhan.
Based on this model, we estimate that there were approximately 34 undetected founder patients at the beginning of the spread of COVID-19 outside China. This suggests that the disease stably followed an approximate exponential growth model at the very beginning. This situation is dangerous, as we expect a 10-fold increase in patient numbers every 19 days, in the absence of strong intervention. We call for strong public health actions worldwide, referring to the experiences learned from China and Singapore.
The manuscript has been preprinted on the medRxiv (doi: https://doi.org/10.1101/2020.03.01.20029819). It is our pleasure that many researchers and social media care more about the outbreak trend outside China through our manuscript. The results of this article have been read more than 9000 times, picked by seven news outlets, and cited more than 10 times.25–29 We reproduced the disease’s initial spread to the world, which would impose a positive impact on other countries to pay attention to the development of COVID-19 and take powerful measures in time.
Acknowledgments
We thank the Fudan University High-End Computing Center for supporting computations involved in this study.
Footnotes
YL and ML contributed equally.
Contributors YW conceived the idea and wrote the source code. YW, YL, ML, and LJ contributed to the data analysis, generating of tables and figures, and manuscript writing. YL, ML, XY, XL, MH, ZH, YW, and LJ contributed to the theoretical analysis and manuscript revision. All authors contributed to the final revision of the manuscript.
Funding Data in this study are publicly available and were downloaded from the WHO Website. Our research was supported by the Postdoctoral Science Foundation of China (2018M640333) and Shanghai Municipal Science and Technology Major Project (2017SHZDZX01).
Competing interests The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement All data relevant to the study are included in the article or uploaded as supplementary information. The source code of the model is available at https://github.com/wangyi-fudan/COVID-19_Global_Model.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, an indication of whether changes were made, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.