Reproducible code for the World Bank Policy Research Working Paper No 9142 - Predicting Food Crises

Afghanistan, Burkina Faso, Chad...and 18 more, 2020

Get Reproducibility Package

Reference ID

WLD_2020_PFC_v01_RR

Producer(s)

Bo Pieter Johannes Andree

Metadata

JSON

Created on

Oct 25, 2021

Last modified

Oct 25, 2021

Page views

1456

Downloads

704

Overview

Abstract

This package contains code and data for a statistical forecasting approach to predict the outbreak of food crises.

Reproducibility Package

Scripts

Readme Get Reproducibility Package

Link: https://nada-demo.ihsn.org/index.php/catalog/54/download/132/readme.txt

Readme file

File name

readme.txt

Title

Readme file

Authors

Name
Bo P.J. Andree

Date

2020-09

Format

txt (ASCII)

Description

Readme file on Introduction, Requirements, Installation, Configuration, Execution, Troubleshooting, Maintainers, sessionInfo

Predicting food crises (main code)

File name

predicting_food_crises.R

Title

Predicting food crises (main code)

Authors

Name
Bo P.J. Andree

Date

2020-09

Format

R script

Software

R version 3.5.1 (2018-07-02)

Description

The main R code used to produce to process data, train and cross-validate models for 1 to 12 month ahead forecasts and generate useful plots and results

Predicting food crises (dependencies)

File name

predicting_food_crises_dependencies.R

Title

Predicting food crises (dependencies)

Authors

Name
Bo P.J. Andree

Date

2020-09

Format

R script

Software

R version 3.5.1 (2018-07-02)

Description

Dependencies required to run the main R code contained in predicting_food_crises.R

Predicting food crises (balanced learners)

File name

predicting_food_crises_balanced_learners.R

Title

Predicting food crises (balanced learners)

Authors

Name
Bo P.J. Andree

Date

2020-09

Format

R script

Software

R version 3.5.1 (2018-07-02)

Description

Popular classifiers with cross-validated probability balancing as described in the paper

Instructions

The code can be run using caret by supplying the object names as the `method=` argument in caret's train function and can be used for other prediction problems.

Software

Name

Version

Version 3.5.1 (2018-07-02)

Libraries

Reproducibility

Technology requirements

The code was developed and last ran in Microsoft Open R 3.5.1, on Ubuntu 16.04.5 LTS and has not been tested on other OS. It should run in R 3.5.1 but the code benefits from multithreaded BLAS/LAPACK and contains a call to automatically sets MKL threads. This may throw an error in R 3.5.1 but should not break the remainder of the code.
- The results presented in the paper have been generated on a virtual machine with 64 CPUs and 256GB RAM. Producing the full set of results in the paper consumed around 12,000 core hours and was run on a D64s_v3 VM with 64CPUs and 256 GiB RAM. Some simplifications have been made to make the final code more usable, comments are left in the main R file.
- Viewing plots: R Studio server is recommended.

Reproduction instructions

INSTALLATION

                                  The user will need to follow standard installation instructions for R. 
                                  * To avoid unexpected issues, it is recommended to run this code on a similar R installation and OS, i.e. Microsoft Open R 3.5.1. on Ubuntu 16.04.5 and r-studio-server 1.2.5001.

                                  Install the required R packages (lines 5 - 34 in predicting_food_crises_dependencies.R). 
                                  * Note that many R packages require the user to install dependencies on ubuntu OS itself.
                                  * User will need to install packages manually, since currently, there is no good way to automatize this. This is due to the large number of (in)direct dependencies in and outside R.
                                  - At the end of this readme file, a print out of sessionInfo() is provided such that versions of all packages can be viewed.
                                  * Note that the main R code (predicting_food_crises.R) sources the dependencies, the balanced learners, and reads the data. 
                                  - The user needs to specify the folder that contains these files in line 8. The default value is:'/home/predicting _food_crisis_package/' which assumes this package is unzipped in the home folder of ubuntu.

                                  The code can be run in a terminal, in which case the data plots will not be visible to the user.
                                  * One solution is to run the code on R Studio server. When set up correctly, one can access the RStudio IDE from anywhere via a web browser and use plot functionality. The code was developed on r-studio-server 1.2.5001. This can be isntalled by following standard installation procedures.

                                  >> CONFIGURATION

                                  There are a number of choices that the user can make to control the behavior of the main program:

                                  * Lines 15-26 are options to control the definition of the dependent variable and the treatment of independent variables. 
                                  The default settings runs a model on all countries, using ipc 3 and above as positive class, uses only exogenous covariates as predictors, adds synthetic cases to the training data, calculates additional features, and restricts linear correlation to .75. These are the settings that correspond to the paper.
                                  * Lines 31-32 control the type of learner used, default settings correspond to a simplified RF algorithm that delivers good results (nearly identical to the paper) but runs much faster.
                                  See also the comments in the code.
                                  * Lines 35-37 control an imputation strategy in case a missing value is encountered, settings should not matter when the supplied data is used.
                                  * Lines 40-43 control the cross-validation, note that repetitions have been reduced to make the runtime and RAM requirements more manageable.
                                  * Lines 46-55 control the compute environment.

                                  Default settings:
                                  * Note that parallel processing works differently on ubuntu than on other OS, but generally it involves generating copies of dependencies or compute environments and so memory requirements can be extremely high even when the initial data set seems manageable. For this reason the following simplifications have been made to default seetings:  
                                  - The number of validation samples has been reduced from 50 to 10.
                                  - The tuning parameters of the default RF model have been fixed at recommended values. To run full tuning or use one of the alternative balanced classifiers, change MODEL_METHOD to one of the classifiers from predicting_food_crises_balanced_learners.R
                                  - When an alterantive model is used, the length of the tuning grid has been reduced to 5, the paper uses 10.
                                  * These settings produce similar results as those presented in the main paper, but the runtime and RAM requirements have been drastically reduced (depending of course on the number of CPUs available). 
                                  - The final code at (recommended) default settings was last run on a D32s_v3 VM with 32CPUs and 128 GiB RAM, reaching 100% CPU utilization and approx 60% RAM utilization, and took just below 2.5 hours to complete.
                                  - By default, the code runs on the entire data set that is provided. Note that the paper only trains and cross-validates on data up to February 2019. With the current settings, it is thus straightforward to update the data set and make real forecasts.

                                  >> EXECUTION

                                  Running code:
                                  * After installation, simply unpack the folder, point the code (line 8) to the correct folder and run predicting_food_crises.R.
                                  * The code is currently not set up to write results to disk. As always, complex R objects can be saved for re-use using saveRDS() and text can be written using write.csv().

                                  >> TROUBLESHOOTING

                                  Dependencies:
                                  * Make sure all OS dependencies are installed such that all libraries can be installed. Then make sure that all R libraries are installed and that also their dependencies are installed.
                                  * See the sessionInfo() readout at the end of this file. 
                                  * Make sure the predicting_food_crises_dependencies.R and predicting_food_crises_balanced_learners.R files are correctly sourced.

                                  Unexpected crash with different compute settings:
                                  * If a different VM is used or if changes are made to the settings and the program crashes halfway, then keep an eye on the RAM usage. On ubuntu this can be monitored using > htop
                                  If RAM usage is too high, reduce the number of cores used in lines 46-55.

                                  NA values in validation metrics:
                                  * A common issue with caret is that validation metrics return as NA. This is likely result of a missing dependency in the slave environment, which may occur because different OS handle parallelization differently. See if the issue persists when setting MODEL_METHOD to another value, for example 'multinom'.

Data

Datasets

predicting_food_crises_data.csv

Name

predicting_food_crises_data.csv

Dataset ID

WLD_2020_PFC_v01_M

Note

Data set used to produce results of the paper

Access policy

Open

Data URL

http://nada-demo.ihsn.org/index.php/catalog/study/WLD_2020_PFC_v01_M

Confidentiality

The published materials do not contain any confidential information.

Citation requirements

The citation of this work is Andree, Bo Pieter Johannes; Chamorro, Andres; Kraay, Aart; Spencer, Phoebe; Wang, Dieter. 2020. Predicting Food Crises. Policy Research Working Paper; No. 9412. World Bank, Washington, DC.

Description

Output

Predicting Food Crises

Type

Working paper

Title

Predicting Food Crises

Authors

Bo Pieter Johannes Andrée, Andres Chamorro, Aart Kraay, Phoebe Spencer, Dieter Wang

Description

World Bank Policy Research Working Paper No 9412. This paper is a product of the Fragility, Conflict and Violence Global Theme and the Development Economics Vice Presidency. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world.

Abstract

Globally, more than 130 million people are estimated to be in food crisis. These humanitarian disasters are associated with severe impacts on livelihoods that can reverse years of development gains. The existing outlooks of crisis-affected populations rely on expert assessment of evidence and are limited in their temporal frequency and ability to look beyond several months. This paper presents a statistical forecasting approach to predict the outbreak of food crises with sufficient lead time for preventive action. Different use cases are explored related to possible alternative targeting policies and the levels at which finance is typically unlocked. The results indicate that, particularly at longer forecasting horizons, the statistical predictions compare favorably to expert-based outlooks. The paper concludes that statistical models demonstrate good ability to detect future outbreaks of food crises and that using statistical forecasting approaches may help increase lead time for action.

URL

http://hdl.handle.net/10986/34510

Authoring entity

Agency Name	Affiliation
Bo Pieter Johannes Andree	World Bank

Acknowledgment statement

This work was prepared as background for the Famine Action Mechanism (FAM).
The authors would like to thank Nadia Piffaretti, Zacharey Carmichael, Harun Dogo, Arif Hussain, Luca Russo, Jose Lopez, Colin Bruce, Nick Haan, Frank Davenport, Dan Maxwell, Joanna Macrae, Soomin Park, Marco Zambotti, Sardar Azari, Therese Norman-Monroe, Jacob LaRiviere, and the IPC, WFP mVAM, and FAO teams for invaluable contributions in the initial phase of this work.
In particular, we'd like to thank the participants of the FAM Workshop in Geneva on February 2018 hosted by ICRC, Artemis Working Days in Rome on April 2018 hosted by WFP, the FAM Data and Analytics meetings with global tech partners in Rome and New York on September 2018, and the participants to the Predictive Analytics workshop hosted by UN OCHA, at the Center for Humanitarian data in the Hague in April 2019.

Date of production

2020-10-07

Scope and coverage

Geographic locations

Location	Code
Afghanistan	AFG
Burkina Faso	BFA
Chad	TCD
Congo, Dem. Rep.	COD
Ethiopia	ETH
Guatemala	GTM
Haiti	HTI
Kenya	KEN
Malawi	MWI
Mali	MLI
Mauritania	MRT
Mozambique	MOZ
Niger	NER
Nigeria	NGA
Somalia	SOM
South Sudan	SSD
Sudan	SDN
Uganda	UGA
Yemen, Rep.	YEM
Zambia	ZMB
Zimbabwe	ZWE

Keywords

crisis malnutrition food price food security food insecurity extreme events unbalanced data costsensitive learning cost-sensitive learning fragility famine famine action mechanism FAM

Themes

Name
Health
Nutrition

Topics

ID	Topic	Vocabulary
C01	Econometrics	Journal of Economic Literature (JEL)
C14	Semiparametric and Nonparametric Methods: General	Journal of Economic Literature (JEL)
C25	Discrete Regression and Qualitative Choice Models - Discrete Regressors - Proportions - Probabilities	Journal of Economic Literature (JEL)
C53	Forecasting and Prediction Methods - Simulation Methods	Journal of Economic Literature (JEL)
O10	Economic Development - General	Journal of Economic Literature (JEL)

Disclaimer

These results and the related working paper reflect the views of the authors, and do not reflect the official views of the World Bank, its Executive Directors, or the countries they represent.

Version statement

Version

1.0

Version Date

September 2020

Access and rights

License

Name	URI
CCA 4.0	https://creativecommons.org/licenses/by/4.0

Contacts

Name	Affiliation
Bo P.J. Andree	World Bank
Andres Chamorro	World Bank
Nadia Piffaretti	World Bank

Back to Catalog