- 1Beijing National Research Center for Information Science and Technology (BNRist), Department of Electronic Engineering, Tsinghua University, Beijing, China
- 2Meituan-Dianping Group, Beijing, China
- 3Data Science Group, Institute for Basic Science, Daejeon, South Korea
- 4School of Computing, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
The recent outbreak of the novel coronavirus (COVID-19) has infected millions of citizens worldwide and claimed many lives. This paper examines the impact of COVID-19 on Chinese e-commerce by analyzing behavioral changes observed on a large online shopping platform. We first conduct a time series analysis to identify product categories that faced the most extensive disruptions. The time-lagged analysis shows that behavioral patterns of shopping actions are highly responsive to the epidemic's development. Based on these findings, we present a consumer demand prediction method by encompassing the epidemic statistics and behavioral features of COVID-19-related products. Experimental results demonstrate that our predictions outperform existing baselines and further extend to long-term and province-level forecasts. Finally, we discuss how our market analysis and prediction can help better prepare for future pandemics by gaining extra time to launch preventive measures.
1. Introduction
Coronavirus disease 2019 (COVID-19) had a massive breakout in Wuhan during the Spring Festival in 2020, followed by a planetary health emergency1. Since it is a novel virus, the world is still learning about the exact transmission types and cures. The disease dynamics were particularly rapid in China; COVID-19 from Wuhan to all other regions became nearly contained over only 2 months. Studying China's case can bring insights into how other countries can cope with the pandemic and how the world can better prepare for future disease (Tian et al., 2020).
The novel virus has had a significant influence on people's daily lives. Given that vaccine development is still underway, governments worldwide and the World Health Organization (WHO) have recommended that people stay at home and avoid crowded locations. According to a McKinsey report (Arora et al., 2020), global citizens increased their reliance on online shopping and delivery of essential goods compared to the prepandemic period during this period. Under these circumstances, people's shopping behaviors are massively changed, and epidemic-related products such as face masks and disinfectants are short in supply, failing to meet people's demand. Such a disruption in supply and demand could reshape online shopping patterns, not only for COVID-19-related products but also for ordinary products. Understanding disruptions in e-commerce with such a sudden and uncertain epidemic could benefit all stakeholders (i.e., retailers, consumers, suppliers, delivery systems, and local governments) and help prepare for the next pandemic. However, due to the lack of data and scenarios of time-concentrated outbreaks, little is known about consumer demand in a pandemic context.
This paper conducts an extensive analysis of online shopping trends before and during the COVID-19 pandemic from the perspective of a popular e-commerce platform in China, Beidian2 (Cao et al., 2020) with the hope of understanding the impact of such an immediate risk on the market and determining the disruptions in product supplies. We characterize the pandemic's impact on the market from changes in product-level demand and supply. First, our analysis reveals which products had increased or decreased sales (after discounting seasonal variation), thereby understanding how households are coping with the pandemic. Second, the analysis compares the differences in browsing, searching, and purchasing activities related to pandemic-related goods, such as face masks, disinfectants, and thermometers, revealing the intricate relationship between supply and demand. We also identify which products experienced the most significant decrease in sales to explore what influences sales and why the unexpected decline happened. Additionally, we use time-lagged cross-correlation to quantify how shopping actions respond to the pandemic by investigating product supply shortages. To the best of our knowledge, no other research has examined disruptions in e-commerce at such a fine-grained level.
Based on these observations, we propose an Encoder-decoder model that leverages online shopping behaviors and COVID-19 pandemic statistics to predict changes in the demand for critical goods, EnCod for short. Experiments show that both shopping and epidemic features gathered from the past weeks are important for predicting behavior related to pandemic-relevant goods in the upcoming days. Our model achieves better prediction performance than the baselines, and it can be fine-tuned at the province level: each province or city may adopt our model to understand the needs of their citizens during a pandemic. Moreover, we make long-term predictions, which further verify the effectiveness of the proposed model. In summary, our main contributions are as follows:
1. We operationalize and release a dataset3 of people's online shopping behaviors during the COVID-19 pandemic, and its multiple features characterize the marketing change during this period.
2. We investigate the changes in different online shopping behaviors, including purchasing, browsing, and searching on the platform, and examine the interplay between the COVID-19 pandemic and consumer behavior.
3. We conduct a time-lagged cross-correlation analysis to reveal which products exhibit a demand pattern that coaligns well with pandemic dynamics.
4. We propose a model to forecast consumer demand on essential product categories, and demonstrate our model's effectiveness with regional and long-term forecasts.
The remainder of the paper is organized as follows. We first review related works in section 2. We then introduce the datasets used in our work and data analysis tools in section 3. We perform data analysis to model market disruptions in section 4 and conduct demand forecasting experiments with our prediction method in section 5. Finally, we provide a discussion in section 6 and conclude the paper in section 7.
2. Related Work
2.1. Effect of the COVID-19 Pandemic
Various studies have appeared since the outbreak of COVID-19 due to its unprecedented challenges for industry and society. Researchers have studied the impact of epidemics from multiple aspects, including transportation (Huang et al., 2020; Lee et al., 2020), control (Gill et al., 2020; Song et al., 2020), inequality (Alon et al., 2020; Gozzi et al., 2020; King et al., 2020), global poverty (Buheji et al., 2020; Sumner et al., 2020; Valensisi, 2020), and stock markets (Baker et al., 2020; Cepoi, 2020; Gormsen and Koijen, 2020). Some studies examine online shopping food services during a government's stay-at-home order (Chang and Meyerhoefer, 2020; Mehrolia et al., 2020). Others have examined the economic impacts of a large epidemic (Schoenbaum, 1987; Meltzer et al., 1999). However, little is known about consumer actions under a health risk, due to the lack of data encompassing such a time-concentrated outbreak (WTO, 2020). This paper investigates the impact of the pandemic on online shopping behaviors and demand forecasting.
2.2. E-Commerce and COVID-19
COVID-19 exerts a significant influence on all e-commerce worldwide, which has changed the nature of business (Bhatti et al., 2020; Elrhim and Elsayed, 2020; Nakhate and Jain, 2020). Researchers examine that the COVID-19 pandemic's effects are different across products, which means that COVID-19 primarily affects several products but may have less impact on some other products (Andrienko, 2020). As people avoid going out and are required to keep social distance and avoid crowds, an increase in overall demand and sale of e-commerce is observed because of the prevalence of COVID-19 (Ali, 2020). As a result, the virus constrains customers to use the internet and develop it as a habit in their daily life (Abiad et al., 2020). Meanwhile, many challenges are encountered in e-commerce (Hasanat et al., 2020; Leone et al., 2020), such as extending the delivery time and improving the process of shipment. Some pandemic-related products, e.g., masks, disinfectants, and disposable gloves, are in very high demand. Under this circumstance, an insufficient supply of key products may negatively affect, giving rise to disruptions in e-commerce and consequently consumers' panic.
2.3. Sequential Demand Forecasting
Several classic regression models have been applied to demand forecasting, such as the autoregressive integrated moving average (ARIMA) (Contreras et al., 2003). However, these models produce accurate forecasting results only when the sequence patterns are linearly correlated and stationary over time (Mills, 1991; Omar et al., 2016). New approaches adopt machine learning and deep learning algorithms, such as XGBoost (Chen and Guestrin, 2016) and sequence-to-sequence (seq2seq) (Sutskever et al., 2014) models, for prediction. Despite their potential, most models require mass data for training, and how they perform under a sudden disruption remains to be investigated. This paper presents an encoder-decoder model for the prediction of near-future demands for epidemic-related products.
During the outbreak of COVID-19, efforts are being made to understand COVID-19 from the perspectives of structural biology (Wrapp et al., 2020), genetics (Hoffmann et al., 2020), economics (Cornwall, 2020), policy (Maier and Brockmann, 2020; Tian et al., 2020), and trend prediction (Cohen, 2020). In addition to these efforts, the current study aims to provide a picture of the impact of COVID-19 on Chinese e-commerce and leverages both epidemic-related and behavior-related information to forecast the demand for essential goods.
3. Data
We use two data sources. The first is from a mobile-based shopping platform, Beidian, one of the largest in China with a monthly user base of 3.44 million and an aggregate 187 million app downloads. Beidian is a one-stop-shop and offers products in nearly two thousand categories. We received anonymized session logs that spanned from January 1, 2019, to April 30, 2020. Each session information contained, for every instance, the action type, product ID, product category, and time. Specifically, we look into three main types of shopping actions: (1) searching by category or keyword, (2) browsing the details of a product, and (3) purchasing specific items. At the user level, we were also given information about the cities in which users reside. In the current analysis, logs originating from Hubei Province were removed since the delivery of goods was prohibited during this area's lockdown. Table 1 displays a summary of the data. The pre-pandemic period data in 2019 and early 2020 are used to adjust for seasonality patterns in the post-pandemic period analysis.
The next data source is the daily epidemic statistics of newly confirmed cases within China. We combine two data sources: (1) official reports from the Nation Health Commission of China4 are used for all data up to January 22, 2020, and (2) COVID-19 dashboard data from the Center for Systems Science and Engineering at Johns Hopkins University5 are used onward for January 22, 2020. We use SQL to aggregate original records at the product level. For implementing deep model, we use PyTorch6.
4. Modeling Market Disruptions
Demand for specific health-related products, such as face masks and hand sanitizer, is bound to increase during a health crisis, leading to a temporary surge in shopping actions. To quantify the degree of the change in purchase popularity of a product category c over a given period t, we define the Relative Popularity (RP) as follows:
where t0 is the reference time to which a target period's popularity is compared. We consider the first week of 2020 as a reference point and repeatedly compute the RP value for every week in 2020. The advantage of such a metric is threefold. First, the ranking of popularity can decouple potential confounders, such as the number of weekly active users on the platform, compared with other indexes, such as the sales amount. Second, the initialization setting can present popularity without the influence of COVID-19 since it considers the ranking before the epidemic. Third, the logarithm images the variations of products with different popularity to a fairer nonlinear scale. (i.e., The difference between 5th and 1st place is more significant than the difference between 500th and 496th.) Specifically, the spring festival, which is the most important traditional festival in China, may substantially impact consumers' behaviors. The festival was on January 25 in 2020 and on February 5 in 2019. Therefore, week t0 starts on December 30 and January 15 in 2020 and 2019, respectively. Thus, the festival days are in the same week (i.e., t0 + 3) in the 2 years. In conclusion, this value trajectory represents disruptions in purchasing behavior before and after COVID-19 to no small extent. Also, we could also observe the change in product popularity due to the Spring Festival with these metrics. Compared with the RP trend in 2019, these metrics' result in 2020 leads to conclusive COVID-19 impacts.
4.1. Popularity Change by Products
4.1.1. The Most Affected Products
Figure 1 presents the signature products with the highest and the lowest RP values. The top seven items, from masks to vitamins, are within the top 20 in terms of the increase in purchase rank change; their sales surged compared to the final week of 2019. In contrast, the bottom five items from utensils to flavored milk are within the bottom 20 and mark the most significant decrease in purchase rank change; their sales have decreased to the greatest extent during the pandemic.
The top list includes epidemic-related products such as face masks, disinfectants, hand sanitizers, and thermometers. The list also includes online course programs for children, disposable utensils, and vitamins. These items are relevant during the pandemic, given that homeschooling, hygiene, and immunity have become either mandatory or essential. Furthermore, (nondisposable) utensils, Chinese wine, cotton clothes, mixed nuts, and flavored milk are at the bottom of the list. In contrast, these items are mostly within the top 100 items before the pandemic. However, their purchase ranks drop by more than ten times during the pandemic. We provide the complete list in Tables 2, 3, which display the top 20 and bottom 20 products with their highest and lowest relative popularity. As we aggregate data by week, the week of January 20 will contain events related to both the Chinese New Year Holiday and the COVID-19 lock impact.
The relative ranking RP(c, t) of COVID-19-related products became very high at the end of January. Especially in the peak period, the corresponding absolute rankings of masks, disinfectants, daily necessities, and hand sanitizers were very high among thousands of items on the Beidian platform, top-1 during the week of January 20, top-2 during the week of February 3, top-1 during the week of January 27, and top-8 during the week of January 27.
4.1.2. Weekly Fluctuations
Figure 2 shows the week-by-week RP values of the 12 prominent goods along with their fitted lines. Here t0 is again set to the first week of 2020. The product rank is shown for all three shopping actions: browsing, searching, and purchasing. The figure also shows the popularity trajectory of the same items in 2019. We shift the timeline for 2019 and sync New Year's holiday week to appear as the fourth data point to discount seasonal effects. Note that the studied popularity measure, RP, is stable and applicable to all popularity levels since it represents the relative rank changes on a logarithmic scale.
Figure 2. Weekly popularity dynamics during the COVID-19 period. Most products in (A,C,E) are increasing in their relative popularity ranking, whereas most products in (B,D,F) show decreasing popularity ranks.
Products in Figures 2A,C, such as face masks, disinfectants, hand sanitizers, and thermometers, show a surge in all shopping actions from January 20th7, the date that the lockdown of Hubei Province was enforced. Face masks, in particular, remain the top-ranked item throughout the pandemic period. The rank difference is substantial and exceeds 2.0 for the search action. Note that an RP of 2.0 indicates a rank increase of several hundred. The dashed lines, representing the identical item rank changes in 2019, do not show any notable increase. This confirms that the surges are not seasonal but unique to 2020 (i.e., epidemic-related).
A sudden rank change for children's online courses is also notable during the pandemic's first week. Nonetheless, the search action of this product is not as high as that of other top products. We also consider products such as thermometers, vitamins, and disposable utensils, whose peaks in demand occur a week or two after the disease outbreak. Due to some news related to the anti-epidemic in online social media, products such as vitamins are also highly demanded. As people rely more on disposable utensils, in contrast, normal utensils become less popular. The times at which each product shows the highest rank change in demand could be used to understand what popular health practices households adopt during the pandemic and how concerned they are about the disease.
In contrast, products in Figures 2B,D show decreased ranks in all shopping actions. Some changes, however, are seasonal and can also be observed in 2019. For instance, the rank order of cotton clothes and Chinese wine gradually decreases in 2019 and 2020, indicating a seasonal effect (e.g., warmer weather and a decrease in wine demand after the New Year's celebration). However, products such as flavored milk and utensils show decreased popularity only in 2020, indicating that households consume certain items less frequently during the pandemic and reduce unnecessary costs during the epidemic period, e.g., flavored milk. It is interesting to contrast the decreasing demand for utensils against the increased demand for disposable utensils; this contrast likely arises due to increased efforts to ensure proper hygiene.
To summarize, we observe that people's online shopping preferences for a series of products, such as disinfectants, were significantly disrupted during the epidemic period. While less popular products such as clothing may also refer to seasonal effects, the epidemic highly raises shoppers' demand for related products such as protection supplies (e.g., masks), sanitation supplies (e.g., disinfectants), health care products (e.g., vitamins), and online services (e.g., online courses, disposable supplies). These effects are specific in browsing, searching, and purchasing, which respond to the seriousness of the epidemic relevantly but differently.
4.1.3. Session Counts by Action Type
The above popularity variations are also different in action types, i.e., browsing, searching, and purchasing, by category. For example, masks are usually sold out during the first few weeks of epidemics. Thus, the number of purchases relies on the supply amount (e.g., January 27 in Figure 2A). However, shoppers are still able to search and browse such products. Thus, the masks' ranking in Figures 2C,E does not drop as much. From Figure 3A, we also observe how the mask supply is disrupted by the epidemic [many searches in week 5 (since January 27) without purchasing] and the effective reaction of the industries (significantly rising of browsing and purchasing from February 3).
Figure 3. Weekly records of browsing, searching, and purchasing on different kinds of products. Those of searching and purchasing are multiplied by 10 (i.e., 1 × 105 in these figures represents 1 × 104 of searching and purchasing). (A) Masks (week scale), (B) disinfectants (week scale), (C) hand sanitizers (week scale), (D) mixed nuts (week scale).
To examine the dynamics across shopping actions, Figure 3 compares the session counts for four representative products. The data show that shoppers engage the most frequently in browsing actions, up to ten times more often than searching and purchasing actions. Thus, we decouple the records of searching and purchasing. Additionally, the number of weekly confirmed epidemic statistics in China is shown in the figure. Usually, on the same kind of products, the relative session counts across the three action types remain similar over time, such as mixed nuts in Figure 3D. In contrast, the search and purchasing action rates may differ for the other three epidemic-related goods because products may be sold out. Interestingly, the browsing and purchasing actions show similar fluctuations. These distinctive patterns are the most obvious for face masks in Figure 3A, where the browsing and purchasing are approximately synchronized, while the searching action demonstrates continued high demand. The plot for disinfectants shows a similar trend, whereas the plot for hand sanitizers shows more minor differences across the three shopping actions. This may be because hand sanitizers were more widely available than the other two items. These results also show that these products are in high demand in different periods than the seriousness of the epidemic.
4.2. Time-Lagged Analysis
The temporal analysis so far has revealed the dependency of online shopping actions on COVID-19; the demand for some products is immediately dependent upon a health risk, but other products stagger in their rank change. Therefore, to identify more explanatory factors, this subsection examines the time difference in detail by computing the lagged correlation between the two sequences. We compute the time-lagged cross-correlation (TLCC) (Shen, 2015), which quantifies the correlation between two non-stationary time series at different time scales. The method shifts two sequences relatively in time (time-lagged) and calculates the correlation (cross-correlation). Therefore, this approach can analyze non-stationary time series and quantify how the three shopping actions are affected by a pandemic.
We use a rolling window of 21 days to analyze different periods, and the lagged time between the two series is varied from 0 to 6 days. Figure 4 presents the results of the TLCC for selected products based on the purchase log. The y-axis represents the time-difference offset, denoting the number of days that the behavioral response falls behind the epidemic spread. The color represents the Pearson correlation coefficient from negative (blue) to positive (red).
Figure 4. Time-lagged cross-correlation results of different products with a rolling time window. The color denotes the Pearson correlation coefficient between the two time series of corresponding behavior and newly confirmed cases. (A) Masks, (B) disinfectants, (C) hand sanitizers, and (D) vitamins.
The correlations are not uniform across products. Figure 4A indicates that the strong correlation between mask purchases and epidemic development does not last long due to falling mask supplies. This trend is shown by dark red blocks on January 14, followed by blue shaded regions. Figure 4B, in contrast, shows that disinfectant purchases closely follow epidemic development throughout the entire period, resulting in a high correlation (i.e., red shaded areas). These results indicate that there is no shortage of disinfectant supplies on the platform.
On the other hand, the demand for hand sanitizer (Figure 4C) and vitamins (Figure 4D) remained high until the end of February. For example, as shown in Figures 3C, 4C, sales of hand sanitizer skyrocket following epidemic development in the early stages of COVID-19, with a week lagged response to epidemic development. However, the lagged effect decreases over time, implying that demand becomes neutral to epidemic development. Furthermore, the positive relationship becomes weak and then negative. This change indicates that hand sanitizer is still in high demand, even after the number of new confirmed cases decreases. Toward the end of the time frame, the lagged positive correlation appears again, which means consumers have stocked enough hand sanitizer and no longer purchase this product. Compared with the result of disinfectant, which is demanded more during the pandemic, the strong correlations of hand sanitizer and vitamin last for a shorter time. In February, the two product categories are observed to have almost no correlation with the epidemic, suggesting that they are not relevant to the epidemic as closely as the disinfectant.
Our data show correlations across shopping actions. Masks are a representative product that faced supply shortages during the pandemic in China. The search actions in Figure 5A show a negative correlation in the later pandemic period, indicating that people are continuing to search for masks. However, the number of confirmed COVID-19 cases is decreasing, as indicated by the blue-shaded regions. The positive correlation in the latter part of Figure 5B can be understood in the same context: only when the number of confirmed patients decreases do the sales and the number of confirmed cases show a positive correlation. In contrast, shopping actions show similar trends for products that are less pandemic related. For example, the trends seen in browsing and searching are similar to those of purchasing snacks in Figure 6. Because the supply has not been affected, the behavioral response during the epidemic is similar across different action types.
Figure 5. Even when the epidemic subsided, consumer interest in searching for masks remained. To prepare for future epidemics, demand forecasting, and inventory planning are vital to avoid supply shortages. (A) Searching on masks, (B) purchasing on masks.
Figure 6. The TLCC results of searching and browsing behaviors on snacks. (A) Purchasing on snacks, (B) browsing on snacks, (C) searching on snacks.
In summary, our analysis shows that dynamic correlations exist between online shopping behavior and epidemic development. We find that behavior responds to the epidemic in a lagged manner, but the correlation can be reversed when there is a shortage or continued caution. These patterns and observations inspire us to design an accurate and explainable predictor for forecasting consumer demand for key product categories.
5. Demand Forecasting and Evaluations
The analysis thus far has demonstrated how COVID-19 impacted product popularity and behavioral patterns. The distinctive patterns of purchasing and searching for critical items, such as facial masks, suggest that the pandemic disrupted the supply of essential goods and led to an imbalance between demand and supply. The analysis also confirmed a significant correlation between shopping actions and epidemic sequences. Together, these findings suggest that the purchase intent of many products is directly affected by epidemic development during a health crisis.
Based on these insights, we present a gated recurrent unit (GRU)-based encoder-decoder model named EnCod that leverages epidemic information and historical shopping behavior to predict critical goods demand. GRU (Bahdanau et al., 2014) is an effective state-of-the-art tool to process sequential data. GRU addresses the vanishing gradient problem in the original recurrent neural network (RNN) (Mikolov et al., 2010) and shows the ability to keep long-term information with the gated mechanism. In the meantime, GRU has much fewer parameters compared to long short-term memory (LSTM) (Sundermeyer et al., 2012) with almost complete performance. Then, we utilize its combination with the encoder-decoder architecture (Cho et al., 2014) to implement the sequence-to-sequence prediction task. Specifically, we use the data of daily confirmed cases and searching behavior in the past 2 weeks to predict the number of searches in the following n days. Figure 7 shows the details of the prediction model. The EnCod model takes the concatenation of sequences of daily confirmed cases and searches as input. The encoder module extracts the historical sequence features and outputs the last hidden states that serve as the decoder's input. Then, the decoder model generates the prediction results.
Figure 7. The overall architecture of our proposed model EnCod, which takes in the concatenation of epidemic and searching sequences in the past m days and outputs the daily number of searches in the next n days.
5.1. Performance Evaluation Protocols
We compare our model with classic time series forecasting algorithms as baselines, including the autoregressive [(AR)] model (Mills, 1991), ARIMA model (Contreras et al., 2003), and deep learning algorithms such as Seq2Seq and XGBoost (Chen and Guestrin, 2016). These methods use only historical shopping behavior and predict future behavior but without epidemic statistics. For a fair comparison, we also use a variant of the XGBoost model that does take both the shopping history and epidemic statistics as inputs (which we call XGBoost-C).
We evaluate the prediction performance of our model and baseline methods by widely used metrics in regression tasks, including mean absolute error (MAE) (Willmott and Matsuura, 2005; Mkhabela et al., 2011) and normalized root mean square error (NRMSE) (Rocha et al., 2007; Peng et al., 2013). Specifically, we use NRMSE to show the models' relative performance when comparing the results across different categories or different provinces, where the relative value is more informative. For long-term forecasting, we also include MAE due to its robustness to outliers, which is vital for evaluating long-term performance. The definitions of selected metrics are as follows:
where N is the number of test samples, yi, i ∈ [1, N] represents the ground truth, and ŷi, i ∈ [1, N] represents the predicted values.
The experimental period is from January 1 to March 31, 2020, covering the main COVID-19 epidemic development in China. We use historical searching logs from the Beidian platform and the daily confirmed cases over the past 2 weeks to predict consumer demand (i.e., product searches) in the immediate week. We split the data into training and testing sets at a ratio of 3:1 according to time. We train the model parameters with the Adam optimizer regularized by an early stop and set the minibatch size to 10. Considering the limited training data, we only use one layer GRU networks and set its hidden size to 4. The learning rate is initialized to 1e-2 and is gradually reduced by 0.1.
5.2. Demand Forecasting by Category
Table 4 shows the prediction performance of EnCod along with that of the baselines. We choose ten categories of two groups, including both the COVID-19-related and COVID-19-unrelated product categories. Products with the highest RP, such as face masks, are considered COVID-19-related products, and those with low RP values are unrelated products.
The COVID-19-related group results show that the addition of epidemic statistics contributes to substantially improved demand forecasting performance. This finding is consistent for both the XGBoost and encoder-decoder models. (Seq2Seq is a variant of our model without epidemic information.) Compared to the best performing baseline, EnCod reduces the NRMSE by 6.9–34.3% in the prediction task for COVID-19-related products. However, XGBoost-C, which considers epidemic information, is not necessarily the second-best performing alternative. Sometimes, the second-best method was the AR, ARIMA, or Seq2Seq model.
Next, in comparing COVID-19-unrelated products, EnCod no longer produces the best results in all prediction tasks. EnCod is only marginally better in some cases, and the XGBoost-C model produces the best results for flavored milk and vegetables. Therefore, consideration of epidemic statistics is not beneficial for some items with low RP values. Among the items, we present the results for daily necessities (e.g., toilet paper, storage bags, kitchen supplies) and home decorations.
The comparison results between the two groups validate that our method can capture the relationship between epidemic development and the change in demand for essential goods. Moreover, the results verify the RP metric's usability, which is defined to characterize the market. The prediction results indicate that the RP metric can be used to determine a product's relevance to an epidemic.
5.3. Regional Forecasting
Now, we predict the demand at the province level to investigate the regional capability. Specifically, we focus on a single product category, face mask, which is the most typical pandemic. To examine province-level results, we choose nine representative provinces in China considering the geography, the distance from Hubei, and confirmed cases. Beijing is the capital of China, Shanxi, and Shaanxi are in northwestern China, Jilin is in northeastern China, Zhejiang, Jiangsu, and Shanghai are in southeastern China, and Sichuan is in southwestern China. Also, we choose two provinces, Hunan and Henan, which are neighborhoods of Hubei. We train each province-specific model with its own confirmed cases and search records to predict citizens' needs in each province.
Table 5 shows the comparison of the regional forecasting results. Even when learning is fine-tuned over province-level data, EnCod still outperforms the baselines in most cases. Compared to that in other provinces, the EnCod model adopted in Hunan, Henan, Sichuan, and Zhejiang, which are located near Hubei Province, achieves relatively low prediction error. This could be explained by the more significant influence of the pandemic in these areas. Thus, the effect of adding COVID-19 statistics is more helpful and leads to better e-commerce behavior predictions.
Table 5. The NRMSE performance of baseline methods and our model measured in nine representative provinces in China.
5.4. Long-Term Forecasting
Next, to test how well EnCod performs for a more extended period prediction, we increase the forecast horizon by changing the n value to {1, 3, 5, 7, 10, 14}, and we also focus on face masks. Table 6 displays the results of these long-term predictions. The AR model performs better for immediate day prediction with n = 1. For n > 1, EnCod consistently outperforms the baselines by a substantial margin regardless of MAE or NRMSE. In contrast, the AR model performs poorly for longer-term prediction, with an NRMSE exceeding 0.5 after a week or more extended prediction. The ability to look beyond a week makes the proposed EnCod model practical and applicable to study future epidemics.
In summary, the above results confirm that our method can be applied to global and local forecasts and long-term forecasts of the demand for essential goods, which is crucial and meaningful during a pandemic. Overall, our method improves upon the forecasting performance of baseline methods, regardless of whether the historical records of the time series are sparse or dense or how long into the future the predictions are made, which shows the model's utility and robustness.
6. Discussion and Limitations
The findings of this research have multiple implications. First, the product-level detailed shopping logs will be anonymized and released to the research community to serve as a critical data source to understand market disruptions during a pandemic. Second, health professionals and e-commerce marketers can utilize the model to predict surges in the short-term demand for particular goods under risk. Third, our research contributes to policy management. For example, policymakers can review the most relevant product goods identified in this research to understand households' needs and take earlier preparation steps. Also, our study reveals that people's demands for different products are dynamic with the development of the epidemic, which offers suggestions for product supply in different epidemic periods. Fourth, EnCod can be utilized in domains outside of e-commerce (such as trade data) to assess the impact of COVID-19 in other sectors.
In China, the population that uses mobile e-commerce is relatively high, and the country has gone through the leading COVID-19 epidemic development during the first quarter of 2020. However, how this analysis can be repeated in other countries, where the proportion of the population that uses mobile e-commerce may be lower, or where COVID-19 is still in progress, must be determined. In the post-COVID-19 era, inventory planning and pricing of goods will have to be decided based on multiple data sources, including user demand. Moreover, considering other modalities of mobile e-commerce to assist demand forecasting, in consideration of cost-effectiveness, would also be an excellent direction in which to extend this work (Cao et al., 2020; Chen et al., 2020). As the first to identify the impact of COVID-19 from this perspective of mobile e-commerce, we believe that this study makes an essential contribution to the community.
While our findings build on extensive data analysis of real-world e-commerce and our model outperforms baseline methods in demand forecasting, it is essential to consider them in the context of several limitations. First, the dataset is drawn from a single company, one of China's largest e-commerce platforms. As a result, the findings may differ in other platforms because of varying platform types and characteristics. Behavioral economics theories suggest that consumers' behavioral decisions are not just rational but more irrational (Kahneman, 2011). In the real world, we often see irrational behavioral choices that do not maximize utility but can cause a loss of economic welfare. Especially in the painful period of COVID-19, whether epidemic development and limitations of supplies increase people's irrational level remains unclear. We believe that combining related theories and incorporating some influencing factors will bring more exciting insights. Nevertheless, we believe that this work presents the most comprehensive analysis of disruptions in Chinese e-commerce during COVID-19, and we leave the abovementioned limitations as future work.
7. Conclusion and Future Work
The COVID-19 pandemic has exacerbated difficulties in the supply of essential products to meet demand and has influenced people's activities on e-commerce platforms. This paper conducts extensive data analysis to investigate how online shopping behavior responds to the pandemic and to discover different behavioral patterns. We find that the disruption in the supply of essential goods in this period led to changes in shopping actions (e.g., a positive rank change of the search action for face masks and other epidemic-relevant products). Therefore, we incorporate epidemic development statistics into demand forecasting and present an EnCod model. The model is simple yet effective in forecasting the demand for COVID-19-related items during the pandemic. Based on our findings and model, several extensions could be made. For example, we can exploit advanced encoder-decoder designs and use other aggregators to combine epidemic statistics and behavioral records. Also, leveraging the online footprints in epidemic forecasting is also an exciting research direction. Furthermore, providing a managerial decision would also be an excellent direction to extend our work.
Data Availability Statement
The original contributions presented in the study are included in the article/supplementary materials, further inquiries can be directed to the corresponding author/s.
Author Contributions
MG and YY performed the modeling of marketing disruptions. YY and ZZ conducted the time-lagged analysis. YY and SK developed the prediction model. YY performed the corresponding experiments. All authors jointly analyzed the results and wrote the paper.
Funding
This work was supported in part by The National Key Research and Development Program of China under grant SQ2018YFB180012, the National Nature Science Foundation of China under 61971267, 61972223, 61861136003, and 61621091, Beijing Natural Science Foundation under L182038, Beijing National Research Center for Information Science and Technology under 20031887521, and research fund of Tsinghua University—Tencent Joint Laboratory for Internet Innovation Technology. SK and MC were supported by the Institute for Basic Science (IBS-R029-C2). MC was also supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF-2017R1E1A1A01076400).
Conflict of Interest
MG was employed by the company Meituan-Dianping Group, but this work is done when he was a Ph.D. student at Tsinghua University.
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Footnotes
1. ^https://covid19.who.int/explorer.
2. ^www.beidian.com.
3. ^https://drive.google.com/drive/folders/17nVCa_dUgR9kNkXVznd6D1Ry8xFD8lxj.
5. ^https://github.com/CSSEGISandData/COVID-19.
7. ^The surge appears on the fourth data points; the fitted lines are for visual aid and do not represent a gradual increase in popularity.
References
Abiad, A., MiaDagli, R., Ferrarini, S., Noy, B., Osewe, I., Pagaduan, P., et al. (2020). Economics Health; COVID-19; Industry and Trade, 16.
Ali, B. (2020). Impact of covid-19 on consumer buying behavior toward online shopping in Iraq. Econ. Stud. J. 18, 267–280.
Alon, T. M., Doepke, M., Olmstead-Rumsey, J., and Tertilt, M. (2020). The Impact of COVID-19 on Gender Equality. Technical report, National Bureau of Economic Research. doi: 10.3386/w26947
Arora, N., Charm, T., Grimmelt, A., Ortega, M., Robinson, K., Sexauer, C., et al. (2020). A Global View of How Consumer Behavior Is Changing Amid COVID-19. Mcknsey and Company.
Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv[Preprint]. arXiv:1409.0473.
Baker, S. R., Bloom, N., Davis, S. J., Kost, K. J., Sammon, M. C., and Viratyosin, T. (2020). The Unprecedented Stock Market Impact of COVID-19. Technical report, National Bureau of Economic Research. doi: 10.3386/w26945
Bhatti, A., Akram, H., Basit, H. M., Khan, A. U., Raza, S. M., and Naqvi, M. B. (2020). E-commerce trends during COVID-19 pandemic. Int. J. Fut. Gener. Commun. Netw. 13, 1449–1452.
Buheji, M., da Costa Cunha, K., Beka, G., Mavric, B., de Souza, Y., da Costa Silva, S. S., et al. (2020). The extent of covid-19 pandemic socio-economic impact on global poverty. A global integrative multidisciplinary review. Am. J. Econ. 10, 213–224. doi: 10.5923/j.economics.20201004.02
Cao, H., Chen, Z., Xu, F., Wang, T., Xu, Y., Zhang, L., et al. (2020). “When your friends become sellers: an empirical study of social commerce site beidian,” in Proceedings of the 14th International Conference on Web and Social Media (Atlanta, GA), 83–94.
Cepoi, C.-O. (2020). Asymmetric dependence between stock market returns and news during COVID19 financial turmoil. Financ. Res. Lett. 36:101658. doi: 10.1016/j.frl.2020.101658
Chang, H.-H., and Meyerhoefer, C. (2020). COVID-19 and the demand for online food shopping services: empirical evidence from Taiwan. Technical report, National Bureau of Economic Research. doi: 10.3386/w27427
Chen, T., and Guestrin, C. (2016). “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, CA), 785–794. doi: 10.1145/2939672.2939785
Chen, Z., Cao, H., Xu, F., Cheng, M., Wang, T., and Li, Y. (2020). “Understanding the role of intermediaries in online social e-commerces: an exploratory study of beidian,” in Proceedings of the ACM on Human Computer Interaction. doi: 10.1145/3415185
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., et al. (2014). Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv[Preprint]. arXiv:1406.1078. doi: 10.3115/v1/D14-1179
Cohen, J. (2020). Scientists are racing to model the next moves of a coronavirus that's still hard to predict. Science 7. doi: 10.1126/science.abb2161
Contreras, J., Espinola, R., Nogales, F. J., and Conejo, A. J. (2003). Arima models to predict next-day electricity prices. IEEE Trans. Power Syst. 18, 1014–1020. doi: 10.1109/TPWRS.2002.804943
Cornwall, W. (2020). Social scientists scramble to study pandemic, in real time. Science 368:6487. doi: 10.1126/science.abc1536
Elrhim, M. A., and Elsayed, A. (2020). The Effect of COVID-19 Spread on the E-Commerce Market: The Case of the 5 Largest E-Commerce Companies in the World. Available online at: https://ssrn.com/abstract=3621166
Gill, B. S., Jayaraj, V. J., Singh, S., Mohd Ghazali, S., Cheong, Y. L., Md Iderus, N. H., et al. (2020). Modelling the effectiveness of epidemic control measures in preventing the transmission of covid-19 in malaysia. Int. J. Environ. Res. Publ. Health 17:5509. doi: 10.3390/ijerph17155509
Gormsen, N. J., and Koijen, R. S. (2020). Coronavirus: Impact on Stock Prices and Growth Expectations. University of Chicago, Becker Friedman Institute for Economics Working Paper. doi: 10.3386/w27387
Gozzi, N., Tizzoni, M., Chinazzi, M., Ferres, L., Vespignani, A., and Perra, N. (2020). Estimating the effect of social inequalities in the mitigation of covid-19 across communities in Santiago de Chile. medRxiv [Preprint]. doi: 10.1101/2020.10.08.20204750
Hasanat, M. W., Hoque, A., Shikha, F. A., Anwar, M., Hamid, A. B. A., and Tat, H. H. (2020). The impact of coronavirus (COVID-19) on e-business in Malaysia. Asian J. Multidisc. Stud. 3, 85–90.
Hoffmann, M., Kleine-Weber, H., Schroeder, S., Krüger, N., Herrler, T., Erichsen, S., et al. (2020). SARS-CoV-2 cell entry depends on ace2 and tmprss2 and is blocked by a clinically proven protease inhibitor. Cell 181, 271-280.e8. doi: 10.1016/j.cell.2020.02.052
Huang, J., Wang, H., Fan, M., Zhuo, A., Sun, Y., and Li, Y. (2020). “Understanding the impact of the covid-19 pandemic on transportation-related behaviors with human mobility data,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 3443–3450. doi: 10.1145/3394486.3412856
King, T., Hewitt, B., Crammond, B., Sutherland, G., Maheen, H., and Kavanagh, A. (2020). Reordering gender systems: can COVID-19 lead to improved gender equality and health? Lancet 396, 80–81. doi: 10.1016/S0140-6736(20)31418-5
Lee, M., Lee, S., Kim, S., and Park, N. (2020). Human mobility during COVID-19 in the context of mild social distancing: implications for technological interventions. arXiv[Preprint]. arXiv:2006.16965.
Leone, L. A., Fleischhacker, S., Anderson-Steeves, B., Harper, K., Winkler, M., Racine, E., et al. (2020). Healthy food retail during the COVID-19 pandemic: challenges and future directions. Int. J. Environ. Res. Publ. Health 17:7397. doi: 10.3390/ijerph17207397
Maier, B. F., and Brockmann, D. (2020). Effective containment explains subexponential growth in recent confirmed COVID-19 cases in China. Science 368, 742–746. doi: 10.1126/science.abb4557
Mehrolia, S., Alagarsamy, S., and Solaikutty, V. M. (2020). Customers response to online food delivery services during COVID-19 outbreak using binary logistic regression. Int. J. Consum. Stud. doi: 10.1111/ijcs.12630. [Epub ahead of print].
Meltzer, M. I., Cox, N. J., and Fukuda, K. (1999). The economic impact of pandemic influenza in the United States: priorities for intervention. Emerg. Infect. Dis. 5:659. doi: 10.3201/eid0505.990507
Mikolov, T., Karafiát, M., Burget, L., Černocký, J., and Khudanpur, S. (2010). “Recurrent neural network based language model,” in Eleventh Annual Conference of the International Speech Communication Association. doi: 10.1109/ICASSP.2011.5947611
Mkhabela, M., Bullock, P., Raj, S., Wang, S., and Yang, Y. (2011). Crop yield forecasting on the Canadian prairies using Modis NDVI data. Agric. For. Meteorol. 151, 385–393. doi: 10.1016/j.agrformet.2010.11.012
Nakhate, S. B., and Jain, N. (2020). The effect of coronavirus on E commerce. Stud. Indian Place Names 40, 516–518.
Omar, H., Hoang, V. H., and Liu, D.-R. (2016). A hybrid neural network model for sales forecasting based on arima and search popularity of article titles. Comput. Intell. Neurosci. 2016:9656453. doi: 10.1155/2016/9656453
Peng, H., Liu, F., and Yang, X. (2013). A hybrid strategy of short term wind power prediction. Renew. Energy 50, 590–595. doi: 10.1016/j.renene.2012.07.022
Rocha, M., Cortez, P., and Neves, J. (2007). Evolution of neural networks for classification and regression. Neurocomputing 70, 2809–2816. doi: 10.1016/j.neucom.2006.05.023
Schoenbaum, S. C. (1987). Economic impact of influenza: the individual's perspective. Am. J. Med. 82, 26–30. doi: 10.1016/0002-9343(87)90557-2
Shen, C. (2015). Analysis of detrended time-lagged cross-correlation between two nonstationary time series. Phys. Lett. A 379, 680–687. doi: 10.1016/j.physleta.2014.12.036
Song, S., Zong, Z., Li, Y., Liu, X., and Yu, Y. (2020). Reinforced epidemic control: saving both lives and economy. arXiv[Preprint]. arXiv:2008.01257.
Sumner, A., Hoy, C., Ortiz-Juarez, E., et al. (2020). Estimates of the Impact of COVID-19 on Global Poverty. Technical report, WIDER Working Paper 2020/43. doi: 10.35188/UNU-WIDER/2020/800-9
Sundermeyer, M., Schlüter, R., and Ney, H. (2012). “LSTM neural networks for language modeling,” in Thirteenth Annual Conference of the International Speech Communication Association (Portland, OR).
Sutskever, I., Vinyals, O., and Le, Q. V. (2014). “Sequence to sequence learning with neural networks,” in Advances in Neural Information Processing Systems, 3104–3112.
Tian, H., Liu, Y., Li, Y., Wu, C.-H., Chen, B., Kraemer, M. U., et al. (2020). An investigation of transmission control measures during the first 50 days of the COVID-19 epidemic in china. Science 368, 638–642. doi: 10.1126/science.abb6105
Valensisi, G. (2020). COVID-19 and global poverty: are LDCS being left behind? Eur. J. Dev. Res. 32, 1–23. doi: 10.35188/UNU-WIDER/2020/830-6
Willmott, C. J., and Matsuura, K. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 30, 79–82. doi: 10.3354/cr030079
Wrapp, D., Wang, N., Corbett, K. S., Goldsmith, J. A., Hsieh, C., Abiona, O., et al. (2020). Cryo-em structure of the 2019-nCoV spike in the prefusion conformation. Science 367, 1260–1263. doi: 10.1126/science.abb2507
Keywords: COVID19, disruption, online shopping, time-lagged analysis, demand forecasting
Citation: Yuan Y, Guan M, Zhou Z, Kim S, Cha M, Jin D and Li Y (2021) Disruption in Chinese E-Commerce During COVID-19. Front. Comput. Sci. 3:668711. doi: 10.3389/fcomp.2021.668711
Received: 17 February 2021; Accepted: 26 February 2021;
Published: 25 March 2021.
Edited by:
Kum Fai Yuen, Nanyang Technological University, SingaporeReviewed by:
Jiang Yi, Chung-Ang University, South KoreaXue Li, Nanyang Technological University, Singapore
Copyright © 2021 Yuan, Guan, Zhou, Kim, Cha, Jin and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Sundong Kim, c3VuZG9uZyYjeDAwMDQwO2licy5yZS5rcg==; Yong Li, bGl5b25nMDcmI3gwMDA0MDt0c2luZ2h1YS5lZHUuY24=