The Climate Data Tool: Enhancing Climate Services Across Africa

Dinku, Tufa; Faniriantsoa, Rija; Islam, Shammunul; Nsengiyumva, Gloriose; Grossi, Amanda

doi:10.3389/fclim.2021.787519

TECHNOLOGY AND CODE article

Front. Clim. , 28 February 2022

Sec. Climate Services

Volume 3 - 2021 | https://doi.org/10.3389/fclim.2021.787519

This article is part of the Research Topic From Observations to Predictions and Projections: Opportunities and Challenges for Climate Risk Assessment and Management in Sub-Saharan Africa View all 10 articles

The Climate Data Tool: Enhancing Climate Services Across Africa

$\nTufa Dinku$ Tufa Dinku¹^*

Rija Faniriantsoa¹

Shammunul Islam²

Gloriose Nsengiyumva¹

Amanda Grossi¹

¹Columbia Climate School, International Research Institute for Climate and Society, Colombia University in the City of New York, New York, NY, United States
²Institute of Remote Sensing, Jahangirnagar University, Dhaka, Bangladesh

Climate services can contribute to alleviating a range of climate-sensitive development challenges, including those of agricultural production and food security. However, the use of climate data for research and applications in Africa has been scanty, mainly due to poor availability of and access to quality climate data. Weather stations are sparse, and their number has been declining. Access to existing climate data is a challenge mainly because of national data policies, low financial investment, lack of dissemination capacity and tools, and high access costs. The ENACTS (Enhancing National Climate Services) initiative led by the International Research Institute for Climate and Society (IRI) at Columbia University has been tackling this problem by working with National Meteorological Services (NMS) in Africa and in other developing countries. This initiative helps NMS to improve data availability and quality, by combining quality-controlled data from national observation networks with satellite estimates for rainfall and climate model reanalysis products for temperature. This requires access to an easy-to-use and freely available tool for preforming the tasks required to generate the data, as well as using the generated data. Most NMS in Africa do not have access to such a tool. To meet this significant need, the IRI developed such a tool in-house. This is the Climate Data Tool (CDT), which is an open-source, R-based software with an easy-to-use a graphical user interface (GUI). It can be used for data organization, quality control, combining station data with satellite and reanalysis data, evaluating merged and inputs datasets, performing an array of analyses, and visualization. The CDT software has been evolving over that last seven years with inputs from the NMS themselves. Now, it has become a powerful and user-friendly tool, and has been installed in over 20 countries in Africa alone.

Introduction

There has always been a close relationship between socio-economic development, human wellbeing, and a varying and changing climate. However, climate change is expected to impede and undo development gains by increasing the frequency and severity of extreme weather events, shifting suitability zones for crops and diseases, and endangering coastal areas with sea-level rise (IPCC, 2014). These changes threaten many essential sectors such agriculture, forestry, water resources, tourism, transportation, energy, and health. However, because about 80% of the world's cultivated land is rain-fed (UNESCO, 2009), agriculture is widely regarded as the most climate-sensitive human activity and sector of all. Climate variability in agriculture not only affects the availability, access, and consumption of food, but also the income of smallholder farmers (Shumetie and Yismaw, 2018).

In the face of these challenges, “effective” use of climate information offers a way for agricultural practitioners to make better informed decisions at different levels, ultimately aiding them on their quest to make agriculture more resilient to increasingly erratic precipitation and temperature patterns. For example, it might inform which types of seed varieties, such as those that are drought or flood-resistant, an extension system promotes, or when a farmer chooses to plant.

No matter the decision at hand, the information should be useful, usable, and used to achieve the goals of climate risk management and adaptation. Climate services, defined by the Climate Services Partnership (2011) as “production, translation, transfer, and use of climate knowledge and information in climate-informed decision making and climate-smart policy and planning,” play a pivotal role in making this happen. However, even when researchers or meteorological agencies strive to produce information that users need, significant barriers may still remain that inhibit that potentially “useful” information from actually being “usable” (McNie, 2013; Vincent et al., 2018). Some of these barriers include the relevance of the information itself, which can be limited by problems with the underlying data—the spatial scale at which data is available, the quality of the data, or even the presence at all data observations.

As a result, information and products that are sorely needed for anticipating, managing, and responding to agriculture or food security risks triggered by climate extremes, as well as adapting to longer-term risks associated with climate change, can simply be limited (De Leeuw et al., 2014; Hansen et al., 2014; WMO, 2014).

This is the case in many places around the world, including most African countries. It is here that collection of climate data has been seriously inadequate, and even when available, poorly accessible (Dinku, 2019). Where data does exist, it is often of poor or inconsistent quality, limiting decision-makers at all levels from taking appropriate adaptive actions in the face of a changing and varying climate.

The Enhancing National Climate Services (ENACTS) initiative of the International Research Institute for Climate and Society (IRI), Columbia University, has been helping countries to address such gaps in data quantity and quality, as well as access to and use of climate information products by working closely with National Meteorological Services (NMS) in Africa and beyond (Dinku et al., 2014, 2018). These NMS have the primary responsibility to provide observed and forecast weather information, climate information, and warnings of impending hydro-climatic threats to a variety of users, and any limitations in their data are thus felt widely.

To ensure such limitations are addressed and the data upon which information is based is robust, the ENACTS approach works directly with NMS in Africa and other developing countries on data quality and availability issues. One of the core ways this is done is by combining quality-controlled data from national observation networks with satellite estimates for rainfall and climate model reanalysis products for temperature (Dinku et al., 2013, 2018).

This data blending process involves the organization of station and proxy data, quality control and check of station and proxy data, combination of quality-controlled station data with proxies, evaluation of the combined data, and further analysis and visualization of station and combined data.

However, many NMS do not have access to an easy-to-use and freely available tool for performing these and other tasks. To meet this need, the Climate Data Tool (CDT), was developed in-house by the IRI, and is now used by 24 countries in primarily in Africa, but also Asia and Latin America. CDT is an open-source, R-based software package with an easy-to-use a graphical user interface (GUI), which can be run under multiple operating systems, including Windows and Linux. The only system requirement is the installation of the latest version of R. After 5 years of evolution and thanks to the iterative feedback from NMS around Africa, CDT has now become a powerful, dynamic, intuitive, and user-friendly tool. The main functionalities of CDT include (see Figure 1):

• Organization of station and proxy data;

• Assessment of data availability;

• Assessment and correction of data quality;

• Merging station observation with proxies;

• Extraction of data from gridded products, including satellite, reanalysis and combined data products, at any point, for a selected box, and for any administrative boundary; and

• Analysis and visualization of station and gridded datasets.

FIGURE 1

Figure 1. CDT graphical user interface.

Though it has a graphical user interface (GUI), CDT can also be run at script level for advanced users who need more flexibility. As any R-package, there are manuals for the different modules.

Implementation of CDT at NMS includes installation of the tool, as well as training. In other words, beyond just a technology or software package, implementation CDT also includes a standard training package. Technological and human resources are addressed in tandem. The training package is generally comprised of three steps:

• Theoretical background on the basics of climate data quality control, remote sensing of rainfall, climate reanalysis products, interpolation of climate data, and combination of climate data from different sources;

• Practical, hands-on training starting with installation of the system;

• Actual use of CDT for data quality control and generation of merged rainfall and temperature data for those who implement ENACTS.

The CDT has so far been installed in 15 ENACTS countries, nine non-ENACTS countries, and two Regional Climate Centers (RCCs) shown in Figure 2. Installation and training in the ENACTS countries and the two RCCs was done mostly by the IRI, while most of the installation and training in the non-ENACTS countries has been done by the RCCs.

FIGURE 2

Figure 2. Countries where CDT has been implemented (installation and training) either as part of ENACTS implementation or independent of ENACTS (implemented by Regional Climate Centers).

The next section, Methodology, provides summary descriptions of the different CDT functionalities, while Section Results presents some results. The paper concludes with discussion in Section Discussion.

Methodology

The CDT allows users, particularly NMS staff, to perform an array of tasks, from data preparation to extensive analyses and visualizations. This is accomplished through an interactive GUI as well as command line execution within the R environment. The main modules that are accessed through the GUI include the following (Figure 1):

• Data preparation

• Quality control of station observations;

• Interpolation of station observations and merging data from different sources

• Validation of merged and other data

• Various analytical tools

• Visualization of stations and gridded data as well as analyses results

An overview of these different modules is presented below.

Data Preparation

The CDT offers a suite of operations dedicated to data preparation (Figure 3), which include organizing observations from meteorological stations, assessing availability of observations, downloading, and processing proxy data (satellite rainfall estimates and climate model reanalysis products). This can also be used for obtaining some ancillary data such as digital elevation models (DEM) and shape files for country administrative boundaries.

FIGURE 3

Figure 3. CDT data preparation menu.

Users can upload station data into CDT in text and comma-separated-values (CSV) formats. CDT can also use data from the WMO-supported climate data management systems (CDMS) used by NMS in Africa. It can directly access data from the CLINMSOFT database while data from other databases such CLISYS and CLIDAT have to be converted to text or CSV format firs. These input are converted into a format used by CDT for further analyses. This is a text file format, and CDT allows users to convert data into this format. The converted data can then be aggregated to different temporal scales, including pentad (5-day), dekadal (10-day), monthly and seasonal. CDT also lets the user explore availability of data in the NMS database (what is available and what is missing) using various formats including, tables, graphs of average number of stations reporting each year, graphs showing number of non-missing data per year for each station over the years, and percentage of data available for all stations plotted on a map. These offer a full picture of the available data, which is important information for both the NMS and their users.

Downloading and Processing Proxy and Ancillary Data

The CDT also allows users to download and process various proxy and ancillary data used in meteorological and climatological analysis. These include six different satellite rainfall estimates:

• ARC (Africa Rainfall Climatology; Novella and Thiaw, 2013);

• CHIRP/S (the Climate Hazards Group Infrared Precipitation and combined with station data; Funk et al., 2015).

• CMORPH [Climate Prediction Center (CPC) Morphing Techniques; Joyce et al., 2004],

• GPM (Global Precipitation Measurement Mission; Hou et al., 2014)

• PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks; Nguyen et al., 2018)

• RFE [Famine Early Warning System NETwork (FEWS NET) satellite rainfall estimate; Xie et al., 2017],

• TAMSAT [Tropical Applications of Meteorology using SATellite and ground-based observations (TAMSAT); Grimes et al., 1999; Thorne et al., 2001]

• TMPA (TRMM Multi- Satellite Precipitation Analysis; Huffman et al., 2010).

There are also three climate reanalysis products that can be downloaded through CDT for generating merged temperature data:

• Japanese 55-year Reanalysis (JRA55);

• Modern-Era Retrospective Analysis 2 (MERRA2); and

• European Center for Medium-Range Weather Forecasts (ECMWF-ERA5).

The main advantage of CDT here is that the users can specify the spatial and temporal domain and time resolution of interest and download only the data they actually need. Most of these products can be downloaded either from the IRI Data Library (Blumenthal et al., 2014) or from the original data source for that particular data product. CDT can also be used to download ancillary data such as digital elevation models country administrative boundaries.

Once downloaded, the satellite and reanalysis data can be processed further using some simple aggregation functions such as sum, average, minimum, maximum, count, etc. These datasets could also be aggregated to coarser spatial resolutions. CDT uses NetCDF as a native gridded data format, but can also export data to other formats such as Climate Predictability Tool (CPT), GeoTiff, and GrADS.

Quality Control

Quality of climate data is a serious challenge for many countries in Africa (e.g., Dinku, 2019). This includes poor accuracy or precision of observations, as well as missing data. These errors can stem from multiple sources that include instrument error, observer error, digitization, summarization, etc. (reference). These errors need to be identified and fixed. However, many NMS lack appropriate tools to perform quality control or the capacity to use existing tools.

The CDT offers a robust quality check process for identifying and, when possible, fixing erroneous observations. The tool enables checking for a multitude of error types and presents the outputs in different formats that enables the NMS staff to easily identify the errors and fix or remove the data from the database. These could be done at different temporal scales including daily, dekadal, and monthly. The NMS staff have the options to replace suspicious values by a missing data code, replace the suspected values with correct observations, or leave the data as it is.

The quality check process implemented in CDT includes the following:

(i) checking for station coordinates;

(ii) checking for false zero values (for rainfall);

(iii) checking for suspicious observations (outliers); and

(iv) checking for homogeneity of climate time series.

Coordinate Check

It would be difficult to automatically check for the accuracy of the coordinates of a given station. Thus, CDT simply checks whether a station is located within the country boundary. It also displays the location of the stations on Google Map. The display enables the NMS staff to check whether the station is located where it is supposed to be. CDT can also check for duplicate and missing coordinates, which happen frequently.

False Zero Check

In many instances, observers may not enter zero values in the data register when there is no rainfall; instead, they may leave the specific dates on the register blank. The data register could also be blank for the days when observations were not taken (missing data). However, during data transfer to computers, data entry staff may enter both (zero observation and missing data) as zero, leading to false zeros. To check for false zeros, CDT compares the percentage of zero values for each month at the target station and the average of the neighboring stations. If the ratio of the stations zero count for that month is greater than a user-defined threshold, that month is flagged as suspicious. Then the NMS staff can choose to investigate or replace the suspicious month with missing data.

Checking for Outliers

There are different ways to check for outliers or suspicious observations. The methods implanted in CDT include, limit check, internal consistency check, temporal check, and spatial check. Limit check involves comparing an observation to previously defined upper and lower limits of the specific element for the specific climate. For instance, no negative values are expected for maximum temperature over locations in Africa, and there could also be upper limits for daily rainfall amounts.

In internal consistency checks, an observation is compared with other parameter values to see if they are physically or climatologically consistent, either instantly or for time series according to adopted observation procedures. Normally, more than one meteorological parameter is measured at an observing station at the same time. Some of these parameters are physically related and the internal consistency check tests if values of related parameters are free of contradictions. The only internal consistency check implemented in CDT is making sure minimum temperature is less than maximum temperature, and vice versa.

Temporal consistency checks if an observation of a given station for a given month is significantly different compared to the long time series for that particular month. This test is based on the fact that many climatological variables show significant serial correlation. In CDT, this comparison is done for each of the 12 months separately in order to make sure data from climatological periods (e.g., cold months vs. warm months) are not compared. The outputs are presented both in tabular and graphical formats, which will be shown in the next section. These presentations allow easy inspection of suspected values, and one can either keep or change these outlier values just by making changes directly on the table.

The spatial check compares the observation to be checked with the observations from nearby stations or the expected value at the station that is estimated using the observations from neighboring stations. This may be accomplished either by interpolation between observations, by checking against numeric prognostic values (on the basis of values from many different stations), or by comparing statistics. Those data for which there is a significant difference between the expected and actual observations are flagged as suspect. The following conditions are checked in CDT: isolated rainfall (rainfall observed at station of interest but surrounding stations reported zero), isolated zero (opposite of the previous one), and outlier (observation too high or too low compared to nearby observations). These suspected values can be viewed both in graphical and tabular formats. Values can be viewed on map with the options of adding administrative boundaries, digital elevation model as well as gridded proxy data. This enables use of background data (e.g., DEM for temperature and satellite estimates for rainfall) that provides additional information to decide whether the observation is actually an outlier. For instance, nearby station could report significantly different temperature observation because of the altitude at which they are located. A DEM background could be used to check if this is the case.

The user can correct the suspicious values in different ways. The easiest one would be replacing all suspected values with missing code. A better way would be consulting paper records to confirm whether the suspicious values is a wrong observation or just wrong entry, and in the latter case replace the suspicious value with the correct one. CDT offers both options. For the second option, the user just needs to enter the correct value to the table presenting the errors (Table 1), and then CDT will replace the value in the original data file. This is very convenient and saves NMS staff a lot of time.

TABLE 1

Table 1. Example table of errors in CDT.

Homogeneity Check

Homogeneity checks are used to determine if a climate time series is homogeneous over a period of time. Data inhomogeneity can affect the quality of climate studies, particularly the domain of climate trends, variability and climate extreme analysis. Inhomogeneity can stem from many factors such as changes in observational routine (Hansel et al., 2016), changes in instruments, observation methods, station relocation, etc. (Li-Juan and Zhong-Wei, 2012). Homogenization of climate data consists of two main parts: detecting breaks and adjusting the specific segment for inhomogeneity (Squintu et al., 2020). The CDT offers four approaches for detecting breaks in a climate time series. The first approach is Pettit test, which is a non-parametric rank-based method used for detecting shift in the mean value of the distribution of the variable under study (Mallakpour and Villarini, 2016). The other approach adapted in CDT is the normal standard test (SNHT) (Alexandersson and Moberg, 1997). The other two approaches use cumulative sum (CUMSUM) approach with and without trends (Gallagher et al., 2013). For adjusting inhomogeneities, CDT offers two methods: mean and quantile matching. Mean method compares the mean before and after the break and quantile matching compares quantiles before and after the break (Squintu et al., 2020).

Gridding

CDT's Gridding menu offers options for simple spatial interpolation using different methods, as well as combining station observations with gridded proxies such as satellite rainfall estimate and temperature reanalysis products. The latter option is a critical component of the ENACTS approach described earlier. The merging process can also be done in a cross-validation mode, which enables evaluation of the merged product.

Spatial Interpolation

CDT offers for spatial interpolation, which include nearest neighbor (Shope and Maharjan, 2015), nearest neighbor with elevation, inverse distance weighted average, modified Shephard (Renka, 1988), spheremap (Kluver et al., 2016), ordinary kriging, and universal kriging (Bargaoui and Chebbi, 2009). The tool enables users explore different methods as well as different parameters (grid seize and resolution, interpolation radius, minimum/maximum number of neighbors, variogram type, …etc.) for the different interpolation methods. Thus, users can choose the most suitable method and parameter for their specific needs. Interpolation can be done at multiple time steps (days, dekads, months, seasons, etc.). One can display and examine the outputs or can perform validation using independent datasets as described later in this section.

Merging Station Observations and Proxy Data

Combining observations from meteorological stations and proxies such as satellite rainfall estimates or climate model reanalysis products can help alleviate challenges with evaluability of data owing to sparse distributions of meteorological observations. In CDT, satellite rainfall estimates are combined with rain gauge measurements while reanalysis products are used for minimum and maximum temperature. CDT uses NetCDF data format both input and output gridded data. The approach adopted in CDT involves the following steps:

• Downscale proxy data (only for reanalysis for temperature);

• Use historical station data to calculate climatological adjustments factors;

• Apply the adjustment factors to all proxy data;

• Merge the output from the previous step with contemporaneous observations for each time step (day, pentad, dekad, month, etc.).

The reanalysis data would need to downscale the reanalysis data from its coarse resolution to a higher resolution (4 km ENACTS data). We can utilize lapse rate for each month to downscale reanalysis data. This involves using digital elevation model (DEM) and station temperature observations to Compute Downscaling Coefficients which are then applied to time series of reanalysis data.

Bias correction aims to remove the bias from proxy data using station data. CDT offers four approaches for mean bias correction: multiplicative bias with variable time step, or for each month, and quantile mapping with fitted distribution and quantile method with empirical distribution. First the correction factors are computed using any one of the four methods. This will create a file with bias coefficients associated with the method selected by the user. The next step would be applying the corrections to the time series of the satellite and reanalysis data.

Next, the bias-corrected time series of proxy data is merged with contemporaneous stations observations. CDT offers four different methods for merging proxy data with station data, which are, simple bias adjustment, Cressman scheme (Mateus et al., 2016), Barnes scheme (Rozante and Demerval, 2010), and regression kriging (Hengl et al., 2007).

Validation

The CDT tool offers a robust validation scheme both to assess the input proxy data, as well as the merged output data. There are several statistics that can be used for doing this comparison, and CDT offers an array of validation statistics, which include correlation coefficient, mean error, Nash-Sutcliffe efficiency coefficient, percent bias, probability of detection and false alarm ratio. The merged data could be validated either using independent data that was not used in the merging or using cross validation. Calculated validation statistics could be displayed as tables or could be visualized on maps.

Analysis and Visualization

Analysis

The CDT tool can perform number of data analyses that are pertinent to climate data analysis. The “Analysis” Menu bar offers different functionalities that include computing summary statistics (minimum, maximum, mean, median, number of missing values, 1st quartile, 3rd quartile and standard deviation), calculating some derived climate variables (e.g., calculating potential evapotranspiration), computing climatologies and anomalies, daily rainfall analysis (rainfall intensity, number of dry days, number of wet days, number of dry spells, and number of wet spells), rainy season characteristics (rainy season onset, rainy season cessation and rainy season length), climate extremes, and various spatial analyses. The results are presented in maps and different graphs such as line chart, bar plot, probability of exceeding for El Nino, La Nina or neutral years, anomaly bar plot, etc. These indices climate extremes are similar to those offered by CLIMDEX from this address: https://www.climdex.org/learn/indices/. Drought indices such as standardized precipitation index (SPI) and standardized precipitation evapotranspiration index can also be calculated using CDT. Figure 4 shows the CDT Analysis menu.

FIGURE 4

Figure 4. CDT analysis menu.

Visualization

The CDT tools offer an extensive visualization feature for presenting input data and the outputs of the different analyses performed. Many of these visualization tools are part of the specific analysis menu and include tables, graphs, and maps. However, CDT also has a separate Menu bar for visualizing different data types such as CDT stations data, CDT gridded data and NetCDF data. One can plot either a single NetCDF files, time sequence of NetCDF files (such as a time series), NetCDF files with multiple variables and even a combination of different data types such as station data and NetCDF data. The later allows users to compare data from different sources (e.g., station, satellite, and merged) as required.

Results

This section will illustrate the varied usage and multiple applications of the CDT tool presented above with examples. As the functionality of CDT is very extensive, only the salient features are covered here. The examples presented in this section are actual outputs from the relevant activities at the different NMS as part of the implementation of ENACTS.

Assessing Data Availability

This functionality offers NMS different options to look at the rainfall and temperature data available in their climate data base. This may sound trivial, but many NMS in Africa, and elsewhere, do not have an easy-to-use tool that enables them to clearly see what data are available, where and when these data are available, and what is missing. CDT presents data availability in tabular (Table 2), graph (Figures 5, 6), and map (Figure 7) formats. Table 2 summarizes the number of reported data for each year and month for a given station. Figure 5 shows the average number of stations that reported each year (in this case for rainfall), while Figure 6 compares how many stations were active and how any of the active stations were reporting for each year. The map in Figure 7 presents the percentage of data (relative to what is expected) available for each station over Zambia.

TABLE 2

Table 2. Number of reported data for each year and month for a given station.

FIGURE 5

Figure 5. Average number of stations reporting observations for each year for precipitation. The red bar represents the minimum and maximum number of days with observations for each year.

FIGURE 6

Figure 6. Active stations and reported data from the same stations.

FIGURE 7

Figure 7. Percentage of available daily rainfall data from each station over Zambia during the period 1981–2020. This shows the completeness of the data for each station.