Autoregressive Models Applied to Time-Series Data in Veterinary Science

Ward, Michael P.; Iglesias, Rachel M.; Brookes, Victoria J.

doi:10.3389/fvets.2020.00604

METHODS article

Front. Vet. Sci. , 17 September 2020

Sec. Veterinary Epidemiology and Economics

Volume 7 - 2020 | https://doi.org/10.3389/fvets.2020.00604

This article is part of the Research Topic Principles and Challenges of Fundamental Methods in Veterinary Epidemiology and Economics View all 16 articles

Autoregressive Models Applied to Time-Series Data in Veterinary Science

$\nMichael P. Ward$ Michael P. Ward¹^*

Rachel M. Iglesias²

Victoria J. Brookes^3,4

¹Sydney School of Veterinary Science, The University of Sydney, Sydney, NSW, Australia
²Australian Government Department of Agriculture, Water and the Environment, Canberra, ACT, Australia
³School of Animal and Veterinary Sciences, Faculty of Science, Charles Sturt University, Wagga Wagga, NSW, Australia
⁴Graham Centre for Agricultural Innovation, NSW Department of Primary Industries, Charles Sturt University, Wagga Wagga, NSW, Australia

A time-series is any set of N time-ordered observations of a process. In veterinary epidemiology, our focus is generally on disease occurrence (the “process”) over time, but animal production, welfare or other traits might also be of interest. A common source of time-series datasets are animal disease monitoring and surveillance systems. Here, we scan the application of methods to analyse time-series data in the peer-reviewed, published literature. Based on this literature scan we focus on autocorrelation and illustrate the recommended steps using ARIMA (Autoregressive Integrated Moving Average Models) methods via analysis of a time-series of canine parvovirus (CPV) events in a pet dog population in Australia, 2009 to 2015. We conclude by identifying the barriers to the application of ARIMA methods in veterinary epidemiology and suggest some possible solutions. In the literature scan the selected 37 studies focused mostly on infectious and parasitic diseases, predominantly for analytical, rather than descriptive or predictive, purposes. Trends and seasonality were investigated, and autocorrelation analyzed, in most studies, most commonly using R software. An approach to analyzing autocorrelation using ARIMA methods was then illustrated using a time-series (week and month units) of CPV events in a pet dog population in Australia, reported to a national companion animal disease surveillance system. This time-series was derived by summing veterinarian reports of confirmed CPV diagnoses. We present data analysis output generated via the R statistical environment, and make this code available for the reader to apply to this or other time-series datasets. We also illustrate prediction of CPV events by rainfall as a covariate. Time-series analysis using ARIMA methods to understand and explore autocorrelation appears to be relatively uncommon in veterinary epidemiology. Some of the reasons might include limited availability of data of sufficient time unit length, lack of familiarity with analytical methods and available software, and how to best use the information generated. We recommend that wherever feasible, such time-series data be made available both for analysis and for methods development.

Introduction

A time-series is any set of N time-ordered observations of a process (1). Within the discipline of epidemiology, our goal is often to understand the underlying processes that generate time-series of disease events. These processes can be explored as part of a time-series analysis, particularly when potential explanatory variables are included as covariates. This can provide insights into disease causation, and thus contribute to the formulation of disease prevention and control programs. However, time-series analysis can also be predictive, with or without covariates. This facilitates the development of forecasting systems to anticipate disease occurrence or detect changes in disease occurrence. Here, we focus on the former goal of understanding disease occurrence.

A key property of time-series is non-independence of values at consecutive time periods. This results in a statistical relationship between values at consecutive time periods and sometimes at different time lags, known as autocorrelation. Temporal autocorrelation is a fundamental characteristic of observations recorded over extended periods of time. We can appreciate that daily rainfall data, for example, recorded over a period of months will show autocorrelation: if it rains on a specific day, it is more likely to rain the following day. In addition, rainfall might be more common during certain months, or seasons. Perhaps less obvious is autocorrelation in time-series of disease occurrence. Diseases can be clustered in time due to causes that are autocorrelated (such as climate), due to the methods used to detect disease and the surveillance programs used (for example, certain diagnostic tests only being performed on Mondays, or inspectors at abattoirs working fixed 6-day shifts), and (for infectious diseases) because the number of infected individuals at one time period directly affects the number of infected individuals at a subsequent time period due to disease transmission. Rather than searching for evidence of temporal clustering (2), autocorrelation methods assumes it is present and seek it describe and understand it. Whilst temporal autocorrelation might be expected, often it is subtle.

Autocorrelation makes common statistical approaches inappropriate, and alternative techniques are needed. Time-series analysis invariably begins with descriptive analyses of the dataset under consideration. This consists of separating out (“decomposing”) the time-scale dependent characteristics which make up the observed temporal pattern of disease or event occurrence. Broadly, these patterns are the long term (secular), periodic cyclical (if time-independent), and seasonal trends. The aim of this analysis is to characterize temporal patterns. There are a variety of methods for decomposition, including decomposition based on locally-weighted scatterplot smoothing [“seasonal and trend decomposition using locally weight scatterplot smoothing (loess),” STL]; we demonstrate this method in the context of the CPV events. The process of decomposition, whilst attempting to remove autocorrelation from a time-series, also allows an understanding of the autocorrelation itself and its potential causes.

As part of the process of exploring a time-series, autoregressive models can be used to determine how much of the observed time-series can be explained by previous observations in the time-series itself. Characterization of temporal patterns—such as trend and seasonality—can be used to understand potential causes of disease. Autoregressive models to describe the occurrence of events based on prior observations include simple autoregressive (AR) models, autoregressive moving average (ARMA) and autoregressive integrative moving average (ARIMA) models, which differ in the way previous values in the time-series are used to describe future values. AR models are essentially linear regressive models for which each regression term is a time-lagged value (i.e., a value measured at a previous time point—the “lag”) of the same time-series. MA models instead use lagged values of forecast errors, and ARMA models combine both. ARIMA models can also include differencing (i.e., the value at one time point is subtracted from the value at another time point) of the series. Causation can be further investigated by multivariate models. For example, autoregressive models can be extended to include covariates, and in a further extension, information from more than one time-series can be used in vector autoregressive models to forecast future values of each time-series. We demonstrate the way in which visual exploration of autocorrelation function (ACF) and partial autocorrelation function (PACF) plots can provide insights into how to fit a model, and how to select the best model fit for ARMA and ARIMA models.

We begin our discussion of the analysis of time-series data in veterinary epidemiology from our perspective that ARIMA methods are not commonly applied within the discipline. In situations in which methods to analyse time-series data have been applied, we investigate the more commonly used methods and data sources reported via a scan of recent literature. This is motivated by an appraisal of current usage and gaps in the field, rather than a comprehensive, systematic review, to provide the reader with a range of literature in which methods for analysis of times-series data have been used. We then demonstrate the application of autoregressive models using ARIMA methods on a surveillance dataset, and make recommendations to increase the use of such methods in veterinary science.

Literature Scan

CAB Abstracts Index via Web of Science was searched using TOPIC: (time-series) and TOPIC: (analysis) and TOPIC: (veterinary) during the timespan 1980 to present (31 August 2019), restricted to English language journal articles only. The titles of all articles returned by this search were screened for scope [time-series analysis methods applied to animal (including zoonotic) diseases]. Note that studies in which time-series data were reported, but which did not describe the application of time-series analysis methods, were excluded.

A template was developed—via discussion between the authors—to extract information from each article (see Supplementary Table 1). Full versions of the subset of articles were then obtained and randomly assigned to one of the three authors.

In total, 60 articles (see Supplementary Table 2) were identified. Of these, five were unavailable for review and 18 were out-of-scope. The latter included articles in which the primary event was a disease in humans only (for example, dengue fever, Crimean-Congo haemorrhagic fever, tick-borne encephalitis, Ross River fever), or the focus was on detection of aberrations within a time-series [for example, (3)]. We applied these exclusions because our aim is to introduce readers to autoregressive models and applications to animal diseases.

Of the remaining 37 articles, publication year ranged from 1990 to 2019 and studies were conducted in 19 different countries (Supplementary Table 2). One study was conducted at the global scale [highly pathogenic avian influenza; (4)]. Data used in these studies were derived from surveillance systems (including internet searches) (14); monitoring systems (11), for example slaughterhouse recording systems; clinical records (6); laboratory records (3); and bespoke research projects (3). These studies were focused mostly on livestock (26). The temporal unit of data collection was most commonly day (17) or month (16), and the median period (years) covered by the datasets analyzed was 10 (IQR 5–16).

The studies identified focused on a wide range of events, but mostly either specific infectious diseases (e.g., rabies) or defined syndromes (e.g., pleurisy and pneumonia).

The purpose of the time-series analysis performed was either analysis (18), description (12), or prediction (7). Studies were considered descriptive if they included only visualization of the time-series or descriptive statistics, whereas those that also included decomposition of the series, or developed models of the time-series, were considered analytical. Those that used the models to predict trends beyond the range of the time-series were considered predictive. Data analyzed was most commonly counts of events. Where data was manipulated before analysis, aggregation to a coarser temporal unit was most common.

Analysis of trends was performed in most (27) studies, mainly using regression models (13). Autoregression was analyzed in the majority of studies (23). In six of these, autocorrelation and partial autocorrelation functions (ACF and PACF; see section An Example of Time-Series Analysis Methods—Canine Parvovirus Reports for Definitions and Methods) were used, and in other studies (10) modeling approaches were used, including autoregressive models. ARMA or ARIMA models were described in 13 of the 23 studies in which autoregression was analyzed. Seasonality was analyzed in 28 studies, however the methods used varied greatly; for example, visual, ACF and PACF, seasonal autoregressive models, automated exponential smoothing state space models, periodograms, and seasonal and trend decomposition STL.

Forecasting was undertaken in 12 studies. The most common (18) software used to analyse time-series data was R.

We observed that the most often cited advantage of using time-series analysis methods was the ability to predict disease occurrence, contributing to early warning and therefore disease prevention. Some of the barriers discussed include the scarcity of long-term, computerized, automatically collected, and publicly available data; identifying outbreak or disease-free baselines; event data sparseness (excessive zeros); data aggregation (temporal scale); time gaps in the data; lack of constant population at-risk; and model validation.

In summary, in this literature scan, time-series analysis methods in veterinary science were mostly focused on infectious and parasitic diseases, analyzed by decomposing and modeling the time-series. This approach most often involves investigation of trends and seasonality, and analysis of autocorrelation, usually aided by the use of R software. Based on this, we next illustrate methods that can be used to investigate and analyse trends, seasonality and autocorrelation in veterinary science by presenting a step-by-step guide to analysis of a canine parvovirus time-series using R.

We focus on ARIMA methods because beyond a description of the trend and seasonality of time-series data, ARIMA models are an accessible method to describe autocorrelations within data and assess the influence of covariates such as climate variables. These methods can be considered a foundation in autoregressive methods for time-series analysis. Other methods—such as aberration detection algorithms, stochastic modeling approaches and machine-learning methods—can then be investigated for applications requiring long-term prediction (5–7).

An Example of Time-Series Analysis Methods—Canine Parvovirus Reports

Prior to embarking on autoregressive modeling, we need to consider when it is appropriate to apply these methods—and when it is not. For such analysis, a dataset of sufficient length and completeness needs to be available. Without sufficient data, it is difficult to identify trends and patterns, to build models, and determine statistical significance. In veterinary science, data generated by monitoring and surveillance systems are often analyzed by autoregressive modeling (see section Literature Scan). However, missing data can be an issue (see section Results of Analyzing a Time-series of Canine Parvovirus Reports), as can data gaps in the time-series caused by temporary interruptions to data collection. Assuming a stable population at-risk simplifies analysis and interpretation of results, but such assumptions need to be plausible. Other more general epidemiological issues—such as selection, ascertainment and measurement bias—also are applicable to autoregressive modeling and need to be considered.

Here we describe an analysis using autoregressive methods as an example that readers can use to guide their own analyses (8). The data and R code used for the analysis are available at https://zenodo.org/record/3738684#.X1HOYNZuLIU (accessed 04/09/2020).

Our time-series analysis begins with a description of the data, including the source, results of initial data checking and any manipulation required to make it suitable for time-series analysis. The time-series is then plotted, and secular and seasonal trends are assessed using decomposition then linear regression. Before fitting an autoregressive model, the series is assessed for stationarity using graphical and statistical methods. Stationarity is a key requirement to fit models to time-series data. A series is considered stationary if it is not changing systematically over time. A method for inducing stationarity—differencing—is also explained and demonstrated. We then fit a number of ARIMA models and use these to forecast disease cases beyond the range of the dataset. Finally, we investigate the influence of a covariate (rainfall) on the time-series and give a brief example of how cross-correlation and vector autoregressive models can be used to investigate relationships in time-series. We present the analysis of the example dataset in a stepwise guide to assist the reader to replicate the approach on this or other, similar datasets.

We have used the R statistical environment (9) for all analysis described. For readers not familiar with this platform, introductory courses and tutorials are widely available online and we recommend spending some time familiarizing yourself with the program before attempting this analysis. The code provided in the Zenodo repository will work if you have R correctly installed and operating on your computer and have installed the packages listed below.

The following packages for data visualization, manipulation and analysis of time-series data used in this analysis: ggplot2 (10), plyr (11), dplyr (12), lubridate (13), tseries (14), vars (15, 16), and forecast (17, 18).

To align readers to the associated R code, the corresponding “chunk” (C) in the code (https://zenodo.org/record/3738684#.X1HOYNZuLIU, accessed 04/09/2020) is included in the methods below. Chunks C1-C3 initiate and load the required packages.

Here, we present a series of six steps to guide the reader in applying time-series analysis to the example dataset.

Step 1: Describing the Data

This worked example uses data from the Disease Watchdog system, in operation since 2010 in Australia and initiated to collect information on infectious diseases of dogs and cats in Australia (19–22). By 2015, nearly 25,000 disease cases and 19,000 reports had been submitted. The system was deactivated in early 2017.

Veterinarians and veterinary clinic staff were the contributors of data within this system. Besides disease diagnoses and their date of occurrence and postcode of residence, a range of other patient data was also collected, including age, sex, neuter status, breed, diagnostic method, and vaccination status. To encourage timely reporting, data was used to produce near-real time disease maps which veterinarians accessed to educate their clients (19). In this example, canine parvovirus (CPV) is used as the event of interest. CPV is a highly contagious disease of dogs and an important cause of morbidity and mortality in young dogs (23). It has a worldwide distribution and occurs as endemic disease or as local outbreaks.

Records of all CPV cases reported Australia-wide between October 2009 and November 2015 were extracted from the Disease Watchdog database. For analysis, cases which were reported to have been vaccinated at any time were excluded. Furthermore, only those cases in which the diagnosis of parvovirus had been confirmed by diagnostic testing were included. To illustrate approaches to analyzing time-series data, we applied these methods to events only, where an event consists of one or more cases reported by the same veterinarian with the same date of occurrence. We also restricted analysis to events reported from the state of New South Wales.

The dataset was loaded (C4) and checked for duplicated or missing data (C5). The number of events, and minimum and maximum dates of occurrence were reported (C6). The number of parvovirus events were then aggregated by week and by month (based on the reported date of occurrence) to create two time-series datasets (weekly and monthly) for subsequent analyses (C7−9).

Step 2: Visualization

Summary information on CPV events was calculated for the time series at both the weekly and monthly aggregation, and each dataset was plotted with a smoothed curve of events overlaid to visually assess trend (C11). The smoothing process in R is achieved by loess regression (see section Step 4 for a technical explanation of this method). This is exploratory analysis that can be used to inform further analytical approaches. Smoothed curves for both events/week (Figure 1) and events/month (Figure 2) demonstrate a decreasing trend over time, with the frequency of events being relatively stable during the period 2010 to 2013. If the aim of the analysis was to investigate risk factors for the pattern of events observed, this might suggest that the time-series can be truncated to the period 2010 to 2013, inclusive. If changes in CPV surveillance are of interest, further analysis might include the entire time-series. In addition, these initial plots and smoothed curves can inform the temporal scale of analysis. Visual assessment of Figures 1 and 2 suggests that monthly aggregation of events is sufficient to preserve the patterns present in the data. However, if the aim of analysis is to identify covariates associated with these patterns, the temporal units used to collect covariate data would also need to be considered.

FIGURE 1

Figure 1. Confirmed canine parvovirus events/week reported from New South Wales in a surveillance system in Australia, 2009–2015. Blue line, loess smoothed curve of events/week with 95% CI (gray).

FIGURE 2

Figure 2. Confirmed canine parvovirus events/month reported from New South Wales in a surveillance system in Australia, 2009–2015. Blue line, loess smoothed curve of events/month with 95% CI (gray).

Step 3: Linear Regression

After conversion of the events series to a computer-recognized “time-series object” (C12), linear regression analysis was used to further explore and quantify secular and seasonal trends (C13). The outcome was the number of events per week (or per month) and the predictors were time in weeks (or months) to assess trend, and week (or month) of the year to assess seasonality. Linear regression is used to confirm impressions from time-series plots and smoothed curves (step 2), to test the statistical significance and to quantify these trends. Identifying such trends is a major component of analysis of time-series data, and can lead to hypothesis-generation regarding potential causes of such patterns.

Step 4: Decomposition

The time-series were then decomposed to separately visualize temporal components including trend and seasonality and the remainder component (also known as “random” or “white noise”). Again, such visualization facilitates the identification and characterization of patterns and potentially what might be causing such patterns. For example, the trend or seasonal pattern might dominate. Alternatively, removing trend and season might still result in the remainder time-series showing a discernable pattern. This suggests greater complexity in the time-series (or the incorrect choice of window size to calculate trend and seasonal components).

Two methods were used: moving averages and “seasonal and trend decomposition using loess” (STL; C14). Both are additive models of the form Y[t] = T[t] + S[t] + e[t] in which Y[t] is the model output at time t, T[t] is the trend component at time t (which includes cyclical and longer trend patterns, the “trend-cycle” component), S[t] is the seasonal component at time t and e[t] is the remainder (or residual i.e., what remains in the time-series after removing seasonal and trend components) at time t. If the variance of the trend or seasonal components of the time-series is not constant throughout the time-series, a multiplicative decomposition is likely to be more appropriate than an additive model.

In moving averages the trend component is determined using a moving average window of an appropriate width. This trend component is then subtracted from the original values, and the data grouped by the seasonal element and averaged for each season. The seasonal component is determined by subtracting the average of the seasonal averages from each seasonal average.

A challenge is the choice of an appropriate moving average width. A default width of three time units can be chosen, meaning that for every observation in the time-series, the observation immediately preceding and immediately following that observation is used to calculate an average value. If data within a time-series have been collected with a known periodicity (for example, observation of disease conditions at an abattoir collected every Monday and Tuesday), this could also be used to inform the moving average width.

The STL method is an iterative process that recalculates the seasonal and trend components by a loess smoothing procedure that initially fits a low-order polynomial to the data. A robustness weighting is calculated for each time point between each iteration, and incorporated into the smoothing procedure in the next iteration, which also uses the trend component from the previous iteration (24).

Step 5: Fitting Autoregressive Models

Once the time-series has been explored using the methods above, we use the information gained from these analyses to select and fit an ARIMA model. For demonstration and due to the findings in these exploratory analyses, seasonal autoregressive models with an ARIMA structure were then fitted to the time-series. Autoregression is the relationship between values in a time-series and values in that same time-series measured previously in time (the lag). For example, an autoregressive model of lag 1 describes the relationship between observations and their value in the preceding time unit. The Auto Regressive (AR) terms refer to the number of lagged values in the model. In the non-seasonal part of the model, the order of lagged values is termed “p,” and in the seasonal part of the model the order of lagged values is termed “P.” Moving Average (MA) terms—not to be confused with the calculation of a moving average in series decomposition—refer to the number of lagged errors in the model. It is essentially the relationship between current and lagged errors in the time-series. In the non-seasonal part of the model, the order of lagged errors is termed “q,” and in the seasonal part of the model the order of lagged errors is termed “Q.” Integration (I) terms refer to the number of differences used to make the time-series stationary. In the non-seasonal part of the model, the order of differences is termed “d,” and in the seasonal part of the model the order of differences is termed “D.” The overall structure of the model can be written as (p, d, q) (P, D, Q) m, in which m refers to the number of time-series observations in a seasonal cycle.

The time-series (weekly and monthly reported CPV events) were assessed for stationarity to determine the orders for d and D to use in the ARIMA model. Initially, an automated function in R was used to determine if differencing was required for both the non-seasonal components (d) and seasonal components (D) of the ARIMA model using a sequence of unit root tests (KPSS test as default, C16). Stationarity was then further assessed using visualization of time-series plots, auto-correlation function (ACF) plots, and statistical tests (C17−18). Statistical tests included the Ljung-Box test, the Augmented-Dickey Fuller (ADF) test and the Kwiatowski-Phillips-Schmidt (KPSS) test. In the case of a non-stationary time-series, the time-series was first-differenced and assessed again for stationarity. The objective of applying this range of methods is to ensure that any need for differencing—either non-seasonal or seasonal—is identified. Some methods (particularly statistical tests) might not suggest the need for differencing in specific datasets, so a conservative approach is to apply several methods.

ACF and PACF plots were also used to assess the moving average (MA; q, Q) and autoregressive (AR; p, P) non-seasonal and seasonal components of the weekly and monthly ARIMA models following differencing (C19). The ACF plot allows us to visualize the correlation between values in the series and values lagged at a certain number of time points previously, whereas the PACF plot shows the correlation between values in the series and those at a given lag after removing the effect of values at intervening lags. ACF plots can indicate the moving average order q to include in an ARIMA model i.e., the lag at which autocorrelation becomes statistically non-significant. Similarly, the PACF plot can inform on the autoregressive order p to include. These functions can also be used to inform on seasonal moving average and autoregressive orders, respectively. We give a practical demonstration of how to interpret ACF and PACF for the purposes of ARIMA model parameterization in section An Example of Time-Series Analysis Methods—Canine Parvovirus Reports, using the time-series of CPV events.

Auto-fitting was used to select a starting model (C20−21). Further models were constructed that were simpler (lower parameter terms than the auto-fitted models) but still within the parameter terms for (p, d, q) (P, D, Q) that were estimated during exploratory analysis (C22). The models with the lowest Akaike Information Criterion (AIC) estimates were selected. Model fit was assessed by visualization of predicted time-series relative to observed time-series, and examination of residuals for stationarity (time-series plot, ACF plot, Ljung-Box test) and normality. Because of the auto-fitting algorithms used to identify candidate ARIMA models, it is important to also visualize model(s) selected to ensure these make logical sense and have a biological explanation. Once a final model has been selected, it can be used to predict events for a specified time period beyond the range of the time-series. A predictive model can form the basis of a forecasting system, in which timely anticipation of disease events allows response strategies to be implemented. There are examples of forecasting in veterinary science using time-series analysis (see section Literature Scan). We demonstrate the use and interpretation of these methods in the context of the CPV data below.

Step 6: Multivariate Analysis

To illustrate multivariate time-series analysis methods, a corresponding time-series of rainfall was created. The center of the postcodes in NSW from which CPV was reported during the study period was identified. This was achieved by joining case and event data to a polygon shapefile of NSW postcodes (ArcGIS v. 10.5. ESRI). We then identified the central feature (Spatial Analyst. ESRI), postcode 2850. From this postcode, a Bureau of Meteorology weather recording station was identified [Mudgee (062021), 32.58°S, 149.58°E] and daily rainfall data during the period 1 January 2009 to 31 December 2015 was extracted ¹. Any missing data in the time-series were supplemented by accessing data from the closest weather recording station [Mudgee Airport AWS (062101)]. The rainfall time-series was then aggregated to a monthly time unit to produce a time-series of total monthly rainfall. Dependent on the data, other metrics might be more appropriate, such as monthly median daily temperature or total monthly degree-days.

Covariate time-series datasets are often derived secondarily to the primary time-series of interest (often disease data in veterinary science). Besides climate (including rainfall, temperature and humidity), time-series data might be available on economic indicators, landscape and environmental variables and demographics. For analysis, data need to have the same temporal scale and duration (including time lags) as the primary time-series of interest, and should also broadly match the spatial extent (i.e., when covariates are used, they should be derived from the same area as the outcome of interest, rather than from a larger or a different area).

The presence of substantial data gaps in the series (other than randomly distributed missing data as in our CPV–rainfall example) can render such series unusable if it is not possible to impute data.

The rainfall data were prepared, described and decomposed to assess temporal trends (C23). Quantitative assessments further investigated the trend, seasonality and need for differencing (C24). An automated function was used to fit a dynamic model (ARIMA with rainfall as a predictor) to the CPV and rainfall time-series (C25). Model fit was assessed by visualization of the predicted time-series relative to the observed time-series, and examination of residuals for stationarity (time-series plot, ACF plot, Ljung-Box test) and normality.

Finally, a vector autoregessive model was fit to the CPV and rainfall time-series following examination of a cross-correlation plot between the CPV and rainfall time-series (C26−28). These models assume that a bi-directional relationship (“feedback”) between the variables is possible. Whilst this might be a useful premise in the context of time-series of disease in different populations (for example, “who infects whom?”), in the context of this dataset this is implausible (CPV events cannot cause rainfall). However, we include the code for demonstration purposes.

Results of Analyzing a Time-Series of Canine Parvovirus Reports

Step 1: Data Description

Between 2009 and 2015, a total of 24,602 cases and 19,048 events were reported in the Disease Watchdog system. Of these, 20,182 and 15,499 respectively were dog cases and events. During this time period, there were a total of 7,933 CPV cases and 5,837 CPV events reported.

Following application of selection criteria (diagnostic method, nil vaccination history), a total of 2,987 events (3,584 cases) remained for analysis (1.2 cases per event). The earliest and latest reporting dates were 6 October 2009 and 1 November 2015, respectively. The duration of the time-series dataset was 2,218 days, 315 complete weeks and 74 complete months. The median (range) number of cases reported per week was 9 (1–45), and the median (range) number of events reported per week was 8 (1–30).