Demo NADA Catalog
Data Catalog
  • Home
  • Catalog
  • Collections
  • Citations
  • How to?
  • Login
    Login
    Home / Central Data Catalog / WLD_2020_PFC_V01_RR
central

Reproducible code for the World Bank Policy Research Working Paper No 9142 - Predicting Food Crises

Afghanistan, Burkina Faso, Chad...and 18 more, 2020
Get Reproducibility Package
Reference ID
WLD_2020_PFC_v01_RR
Producer(s)
Bo Pieter Johannes Andree
Metadata
JSON
Created on
Oct 25, 2021
Last modified
Oct 25, 2021
Page views
1278
Downloads
592
  • Project Description
  • Downloads
  • Overview
  • Reproducibility Package
  • Description
  • Scope and coverage
  • Disclaimer
  • Version statement
  • Access and rights
  • Contacts
  • Overview

    Abstract

    This package contains code and data for a statistical forecasting approach to predict the outbreak of food crises.

    Reproducibility Package

    Scripts
    Readme Get Reproducibility Package
    Link: https://nada-demo.ihsn.org/index.php/catalog/54/download/132/readme.txt
    Readme file
    File name
    readme.txt
    Title
    Readme file
    Authors
    Name
    Bo P.J. Andree
    Date
    2020-09
    Format
    txt (ASCII)
    Description
    Readme file on Introduction, Requirements, Installation, Configuration, Execution, Troubleshooting, Maintainers, sessionInfo
    Predicting food crises (main code)
    File name
    predicting_food_crises.R
    Title
    Predicting food crises (main code)
    Authors
    Name
    Bo P.J. Andree
    Date
    2020-09
    Format
    R script
    Software
    R version 3.5.1 (2018-07-02)
    Description
    The main R code used to produce to process data, train and cross-validate models for 1 to 12 month ahead forecasts and generate useful plots and results
    Predicting food crises (dependencies)
    File name
    predicting_food_crises_dependencies.R
    Title
    Predicting food crises (dependencies)
    Authors
    Name
    Bo P.J. Andree
    Date
    2020-09
    Format
    R script
    Software
    R version 3.5.1 (2018-07-02)
    Description
    Dependencies required to run the main R code contained in predicting_food_crises.R
    Predicting food crises (balanced learners)
    File name
    predicting_food_crises_balanced_learners.R
    Title
    Predicting food crises (balanced learners)
    Authors
    Name
    Bo P.J. Andree
    Date
    2020-09
    Format
    R script
    Software
    R version 3.5.1 (2018-07-02)
    Description
    Popular classifiers with cross-validated probability balancing as described in the paper
    Instructions
    The code can be run using caret by supplying the object names as the `method=` argument in caret's train function and can be used for other prediction problems.
    Software
    R
    Name
    R
    Version
    Version 3.5.1 (2018-07-02)
    Libraries
    • TTR

    Reproducibility

    Technology requirements
    • The code was developed and last ran in Microsoft Open R 3.5.1, on Ubuntu 16.04.5 LTS and has not been tested on other OS. It should run in R 3.5.1 but the code benefits from multithreaded BLAS/LAPACK and contains a call to automatically sets MKL threads. This may throw an error in R 3.5.1 but should not break the remainder of the code.
      • The results presented in the paper have been generated on a virtual machine with 64 CPUs and 256GB RAM. Producing the full set of results in the paper consumed around 12,000 core hours and was run on a D64s_v3 VM with 64CPUs and 256 GiB RAM. Some simplifications have been made to make the final code more usable, comments are left in the main R file.
      • Viewing plots: R Studio server is recommended.
    Reproduction instructions

    INSTALLATION

                                      The user will need to follow standard installation instructions for R. 
                                      * To avoid unexpected issues, it is recommended to run this code on a similar R installation and OS, i.e. Microsoft Open R 3.5.1. on Ubuntu 16.04.5 and r-studio-server 1.2.5001.
    
                                      Install the required R packages (lines 5 - 34 in predicting_food_crises_dependencies.R). 
                                      * Note that many R packages require the user to install dependencies on ubuntu OS itself.
                                      * User will need to install packages manually, since currently, there is no good way to automatize this. This is due to the large number of (in)direct dependencies in and outside R.
                                      - At the end of this readme file, a print out of sessionInfo() is provided such that versions of all packages can be viewed.
                                      * Note that the main R code (predicting_food_crises.R) sources the dependencies, the balanced learners, and reads the data. 
                                      - The user needs to specify the folder that contains these files in line 8. The default value is:'/home/predicting _food_crisis_package/' which assumes this package is unzipped in the home folder of ubuntu.
    
                                      The code can be run in a terminal, in which case the data plots will not be visible to the user.
                                      * One solution is to run the code on R Studio server. When set up correctly, one can access the RStudio IDE from anywhere via a web browser and use plot functionality. The code was developed on r-studio-server 1.2.5001. This can be isntalled by following standard installation procedures.
    
                                      >> CONFIGURATION
    
                                      There are a number of choices that the user can make to control the behavior of the main program:
    
                                      * Lines 15-26 are options to control the definition of the dependent variable and the treatment of independent variables. 
                                      The default settings runs a model on all countries, using ipc 3 and above as positive class, uses only exogenous covariates as predictors, adds synthetic cases to the training data, calculates additional features, and restricts linear correlation to .75. These are the settings that correspond to the paper.
                                      * Lines 31-32 control the type of learner used, default settings correspond to a simplified RF algorithm that delivers good results (nearly identical to the paper) but runs much faster.
                                      See also the comments in the code.
                                      * Lines 35-37 control an imputation strategy in case a missing value is encountered, settings should not matter when the supplied data is used.
                                      * Lines 40-43 control the cross-validation, note that repetitions have been reduced to make the runtime and RAM requirements more manageable.
                                      * Lines 46-55 control the compute environment.
    
                                      Default settings:
                                      * Note that parallel processing works differently on ubuntu than on other OS, but generally it involves generating copies of dependencies or compute environments and so memory requirements can be extremely high even when the initial data set seems manageable. For this reason the following simplifications have been made to default seetings:  
                                      - The number of validation samples has been reduced from 50 to 10.
                                      - The tuning parameters of the default RF model have been fixed at recommended values. To run full tuning or use one of the alternative balanced classifiers, change MODEL_METHOD to one of the classifiers from predicting_food_crises_balanced_learners.R
                                      - When an alterantive model is used, the length of the tuning grid has been reduced to 5, the paper uses 10.
                                      * These settings produce similar results as those presented in the main paper, but the runtime and RAM requirements have been drastically reduced (depending of course on the number of CPUs available). 
                                      - The final code at (recommended) default settings was last run on a D32s_v3 VM with 32CPUs and 128 GiB RAM, reaching 100% CPU utilization and approx 60% RAM utilization, and took just below 2.5 hours to complete.
                                      - By default, the code runs on the entire data set that is provided. Note that the paper only trains and cross-validates on data up to February 2019. With the current settings, it is thus straightforward to update the data set and make real forecasts.
    
                                      >> EXECUTION
    
                                      Running code:
                                      * After installation, simply unpack the folder, point the code (line 8) to the correct folder and run predicting_food_crises.R.
                                      * The code is currently not set up to write results to disk. As always, complex R objects can be saved for re-use using saveRDS() and text can be written using write.csv().
    
                                      >> TROUBLESHOOTING
    
                                      Dependencies:
                                      * Make sure all OS dependencies are installed such that all libraries can be installed. Then make sure that all R libraries are installed and that also their dependencies are installed.
                                      * See the sessionInfo() readout at the end of this file. 
                                      * Make sure the predicting_food_crises_dependencies.R and predicting_food_crises_balanced_learners.R files are correctly sourced.
    
                                      Unexpected crash with different compute settings:
                                      * If a different VM is used or if changes are made to the settings and the program crashes halfway, then keep an eye on the RAM usage. On ubuntu this can be monitored using > htop
                                      If RAM usage is too high, reduce the number of cores used in lines 46-55.
    
                                      NA values in validation metrics:
                                      * A common issue with caret is that validation metrics return as NA. This is likely result of a missing dependency in the slave environment, which may occur because different OS handle parallelization differently. See if the issue persists when setting MODEL_METHOD to another value, for example 'multinom'.

    Data

    Datasets
    predicting_food_crises_data.csv
    Name
    predicting_food_crises_data.csv
    Dataset ID
    WLD_2020_PFC_v01_M
    Note
    Data set used to produce results of the paper
    Access policy
    Open
    Data URL
    http://nada-demo.ihsn.org/index.php/catalog/study/WLD_2020_PFC_v01_M
    Confidentiality

    The published materials do not contain any confidential information.

    Citation requirements

    The citation of this work is Andree, Bo Pieter Johannes; Chamorro, Andres; Kraay, Aart; Spencer, Phoebe; Wang, Dieter. 2020. Predicting Food Crises. Policy Research Working Paper; No. 9412. World Bank, Washington, DC.

    Description

    Output
    Predicting Food Crises
    Type
    Working paper
    Title
    Predicting Food Crises
    Authors
    Bo Pieter Johannes Andrée, Andres Chamorro, Aart Kraay, Phoebe Spencer, Dieter Wang
    Description
    World Bank Policy Research Working Paper No 9412. This paper is a product of the Fragility, Conflict and Violence Global Theme and the Development Economics Vice Presidency. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world.
    Abstract
    Globally, more than 130 million people are estimated to be in food crisis. These humanitarian disasters are associated with severe impacts on livelihoods that can reverse years of development gains. The existing outlooks of crisis-affected populations rely on expert assessment of evidence and are limited in their temporal frequency and ability to look beyond several months. This paper presents a statistical forecasting approach to predict the outbreak of food crises with sufficient lead time for preventive action. Different use cases are explored related to possible alternative targeting policies and the levels at which finance is typically unlocked. The results indicate that, particularly at longer forecasting horizons, the statistical predictions compare favorably to expert-based outlooks. The paper concludes that statistical models demonstrate good ability to detect future outbreaks of food crises and that using statistical forecasting approaches may help increase lead time for action.
    URL
    http://hdl.handle.net/10986/34510
    Authoring entity
    Agency Name Affiliation
    Bo Pieter Johannes Andree World Bank
    Acknowledgment statement

    This work was prepared as background for the Famine Action Mechanism (FAM).
    The authors would like to thank Nadia Piffaretti, Zacharey Carmichael, Harun Dogo, Arif Hussain, Luca Russo, Jose Lopez, Colin Bruce, Nick Haan, Frank Davenport, Dan Maxwell, Joanna Macrae, Soomin Park, Marco Zambotti, Sardar Azari, Therese Norman-Monroe, Jacob LaRiviere, and the IPC, WFP mVAM, and FAO teams for invaluable contributions in the initial phase of this work.
    In particular, we'd like to thank the participants of the FAM Workshop in Geneva on February 2018 hosted by ICRC, Artemis Working Days in Rome on April 2018 hosted by WFP, the FAM Data and Analytics meetings with global tech partners in Rome and New York on September 2018, and the participants to the Predictive Analytics workshop hosted by UN OCHA, at the Center for Humanitarian data in the Hague in April 2019.

    Date of production

    2020-10-07

    Scope and coverage

    Geographic locations
    Location Code
    Afghanistan AFG
    Burkina Faso BFA
    Chad TCD
    Congo, Dem. Rep. COD
    Ethiopia ETH
    Guatemala GTM
    Haiti HTI
    Kenya KEN
    Malawi MWI
    Mali MLI
    Mauritania MRT
    Mozambique MOZ
    Niger NER
    Nigeria NGA
    Somalia SOM
    South Sudan SSD
    Sudan SDN
    Uganda UGA
    Yemen, Rep. YEM
    Zambia ZMB
    Zimbabwe ZWE
    Keywords
    crisis malnutrition food price food security food insecurity extreme events unbalanced data costsensitive learning cost-sensitive learning fragility famine famine action mechanism FAM
    Themes
    Name
    Health
    Nutrition
    Topics
    ID Topic Vocabulary Vocabulary URI
    C01 Econometrics Journal of Economic Literature (JEL)
    C14 Semiparametric and Nonparametric Methods: General Journal of Economic Literature (JEL)
    C25 Discrete Regression and Qualitative Choice Models - Discrete Regressors - Proportions - Probabilities Journal of Economic Literature (JEL)
    C53 Forecasting and Prediction Methods - Simulation Methods Journal of Economic Literature (JEL)
    O10 Economic Development - General Journal of Economic Literature (JEL)

    Disclaimer

    Disclaimer

    These results and the related working paper reflect the views of the authors, and do not reflect the official views of the World Bank, its Executive Directors, or the countries they represent.

    Version statement

    Version

    1.0

    Version Date

    September 2020

    Access and rights

    License
    Name URI
    CCA 4.0 https://creativecommons.org/licenses/by/4.0

    Contacts

    Contacts
    Name Affiliation
    Bo P.J. Andree World Bank
    Andres Chamorro World Bank
    Nadia Piffaretti World Bank
    Back to Catalog
    Demo NADA Catalog

    © Demo NADA Catalog, All Rights Reserved.