Microplots and food security: encouraging replication studies of policy relevant research

Replication research is a valuable, yet often misunderstood, tool for increasing our understanding of promising research findings. In this short paper the authors discuss their principles for conducting replication research, explain how they chose a candidate study for replication, describe their replication analysis robustness checks, and give an overview of their approach to interpreting replication results. They also describe some of their lessons learned after working in replication research for over six years. (Published in Special Issue The practice of replication) JEL Q18 O13


Introduction: Replication in practice
Replication studies enhance the credibility of scientific research. Research is increasingly used for policymaking, thus increasing the importance of conducting replication studies. However, no common, cross-disciplinary understanding of the concepts and methodologies exists to guide replication researchers. Moreover, weak incentive structures generally discourage replication. We developed our replication philosophy based on our replication-related work with the International Initiative for Impact Evaluation (3ie). 3ie's Replication Program aims to improve the quality of evidence used for policymaking through the replication of influential and innovative research. In line with the goal of this special issue, in this paper we present our principles for conducting replication research and provide an example of how to conduct an internal replication based on those principles. We advocate for transparent internal replication studies, where researchers reproduce published results using existing data. For us, these studies start by drafting a replication plan. These plans outline the researchers' intended analysis, allow them to pre-specify their replication robustness checks, and force them to justify any additional analyses not originally anticipated. In our model, once a replication plan is drafted, the researchers then contact the original authors requesting the statistical code, data, and any replication instruction.
Following 3ie's standard replication process, we suggest researchers conduct internal replication studies in four stages: i) a push button replication (PBR), ii) pure replication, iii) measurement and estimation analysis (MEA), and iv) theory of change analysis (TCA). In the initial two steps, researchers verify the published results, using the original research methods. In the last two steps, researchers assess the strength of the published results to pre-specified robustness checks ). In the rest of this section we detail our replication process for the paper we selected for replication.
Our replication study will begin with a PBR where we run the original authors' statistical code on their data. We then compare our PBR results to the original results, with pre-specified decision rules indicating how we classify any differences. Given the sensitivity around replication, we will avoid words like "error" or "mistake" during the replication process .
When analyzing our PBR results, we will compare our pre-specified key results to the published findings. In the PBR stage, where we are more confident assessing the replicability of the original publication, we will follow 3ie's PBR protocol to classify the results as comparable, minor differences, major differences, etc. (Wood et al. 2017). We believe classifying these results, while still contingent on thresholds of difference, is less controversial than other forms of replication in that PBR removes much of the interpretation from the process.
We will differentiate minor and major differences through the imperfect lens of p-values. For p-values, we will classify major differences as changes greater than or equal to 0.1, and minor differences as changes of less than 0.1 but greater than or equal to 0.05. Finally, following 3ie's PBR protocol, we will not consider as a difference any p-values that change less than 0.05. In addition, we later suggest how to classify differences in parameter estimates.
We then assess the reproducibility of the paper by recoding the original results in our pure replication stage. We will use the data provided by the authors to conduct our analysis without using their original code. Instead, we will code the analysis based on the paper's description of the analysis, any additional information from the working paper, and any descriptions of the data analysis in any instructions associated with the data.
Finally, we will conduct the measurement and estimation analysis and the theory of change analysis we describe in our replication plan below. We apply Brown and Wood's (2018) replication diagnostic to determine the most appropriate robustness checks for our replication study. We conduct these robustness exercises to assess if we, as independent researchers, are able to produce similar results as the original authors when using alternative analysis strategies.
There are a few stages where we will consult the original authors. During the PBR and pure replication stages, we will contact them if we find that the original data or code is incomplete, or major discrepancies appear, to confirm that we have the right code and data set. Then, at the end of our study, we will share the report with the original authors for their optional comment.
By conducting our replication study following a replication plan and certain guidelines, we are attempting to move beyond noting the need for more replication research in the social sciences. While numerous calls have highlighted a desire for more replication studies, researchers continually describe how an incentive compatibility problem and a lack of clarity discourages replication studies (Duvendack and Palmer-Jones (2013) provide an example of the replication incentives discussion). We hope this study, along with all the papers in this special issue, will encourage researchers to conduct more replication research.

2
Motivating the selection of our "candidate" paper Food security and land access are two major development issues. For our internal replication study we selected Santos et al. (2014) paper "Can government-allocated land contribute to food security? Intrahousehold analysis of West Bengal's microplot allocation program." They assess the impact of the Nijo Griha, Nijo Bhum (NGNB) program, which provides small land plots to landless poor households. NGNB titles most microplots in the name of female household members. The researchers expected the program to immediately affect a set of intermediate outcomes including: tenure security, agricultural investments, use of credit for agriculture, and women's participation in household decisions. The program aims for long-term impacts of reducing hunger vulnerability, increasing protein consumption, equitable intra-household food distribution, and improving dietary diversity. The authors conduct their analysis using baseline and endline data from 1035 households: 671 program recipients and 364 non-recipients. Given endline attrition problems and balance issues across the treatment and control group, the authors use an inverse propensity scoreweighted regression model to estimate the effect of the intervention. Most of the analysis is conducted at the household level, while the tenure security outcomes are calculated at the plot level.
The researchers find statistically significant program effects on four intermediate outcomes of interest. Among the most relevant results, they report NGNB households were 12 percent more likely to obtain a loan from a formal bank. These households also typically invested more in agriculture. For example, treatment households were 11 percent more likely to use fertilizer or pesticides. Women in NGNB households were more likely to participate in decisions regarding the household's land and 9 percent more likely to participate in food purchase and consumption decisions. Finally, within the plot level analysis, the original authors find women to be 17 percent more likely to report that they expected to maintain control over the NGNB plot in five years.
We selected the article for internal replication because of its contribution to the understanding of an intervention at the intersection of food insecurity and land tenure. From 2010-2012, the Food and Agriculture Organization (FAO) estimates a 12.9 percent prevalence of undernourishment in developing countries (FAO 2015). The development community has long recognized ensuring access to land and security of tenure as important components of food security policies (for example, HLPE 2013). Considering the link between smallholder productivity and household food security, providing microplot access may significantly improve household welfare. Given the scarcity of land availability in many countries, we believe assessing the effectiveness of distributing microplots is more policy relevant than many other similar interventions on this topic.
Santos et al. make a number of contributions to the literature. Although land transfer programs are policy relevant, the literature contains little empirical research on them. Researchers have suggested that providing access to small plots, even fractions of an acre, may create a household safety net and contribute to improved nutrition and income outcomes. In peri-urban and urban areas, several evaluations have found positive associations between small agriculture interventions and food security, with the caveat that more rigorous evidence is needed (Warren et al. 2015;Poulsen et al. 2015). To the best of our knowledge, this paper uniquely analyzes the effects of distributing microplots of land on food security. Also, the few papers that evaluate the effects of land transfer programs focus on income not nutrition (Keswell and Carter 2014;Benjamin, Brandt, McCaig, and Le Hoa 2017). In that sense, Santos et al. contribute to the literature on securing land rights and agriculture investment (Deininger 2006;Goldstein and Udry 2008;Besley 1995) and homestead agriculture for improving nutrition (Talukder et al. 2010). The paper specifically contributes to the literature on the effects of land rights/ownership and women's bargaining power over consumption and investment decisions (Brule 2010;Wiig 2013;Mishra and Sam 2016) and the effect of women empowerment through land security on children's health or education outcomes (Menon, Rodgers and Nguyen 2014;Ghebru and Holden 2013;Allendorf 2007).
Finally, the study can have direct policy implications for designing programs that expand land access, especially in India. These programs already exist in West Bengal, Karnataka, Andhra Pradesh, and Odisha. The paper can further inform the debate in India regarding the bill that entitles landless rural households to access plots of 0.1 acre (Government of India 2013), which has been on hold since 2013 (Draboo 2015). As various levels of the Government of India consider scaling up this program, testing the robustness of the results to replication would help policymakers better evaluate the effectiveness of this intervention.

Replication plan
Our four stage replication plan includes steps that range in complexity from simply reproducing the results with the original code and data to analyzing the theory of change of the program. The first two stages examine if the results are replicable based on the paper's data, methods, and statistical code. While our replication study will reanalyze all of the published results, we will focus on reproducing the key results we describe below. In the final stages of our replication study, we will analyze the intermediate results and the program's theory of change.
The original analysis reported promising results. In our study we will determine if these results are sensitive to a series of pre-specified robustness checks. We use Brown and Wood's (2018) replication diagnostic to focus our robustness checks on four aspects of the original publication: the validity of assumptions, data transformations, estimation methods, and heterogeneous impacts. Our checks will reexamine the influence of providing a microplot on a range of household and individual level results. We focus on the land tenure security results, as the original authors find statistically significant increases in program participants that corresponded to increases in plot sizes. Plot size requirements are highly relevant to the research question, both because of cost and land distribution constraints. We will undertake an exploratory analysis to determine if a plot size threshold exists, under which microplots are too small to influence household outcomes.
We plan to conduct additional tests not reported by the original authors. We will examine if the intervention affected household wealth, which we find plausible given the intermediate results. Furthermore, we will independently assess the results for households headed by women.
We will follow 3ie's PBR protocol to quantify any differences from our PBR findings. Our analysis we will assess any changes in the coefficient size, direction of coefficient, and statistical significance of the results in comparison to the original publication. Based on descriptive statistics, we set major and minor parameter estimate thresholds for our key results.

Push button replication
As explained in the first section, the initial step in our replication paper is to use the original code and data to replicate the paper's results. To do so, we will use the original authors' software, data, and statistical code. The paper identifies positive impacts on intermediate outcomes related to i) perceived land tenure security, ii) likelihood of access credit for agriculture, iii) use of improved inputs, and iv) women's likelihood to be involved in important food and agriculture decisions. In each area the authors present several outcomes, each of which provides valuable information on the effects of microplots on nutrition. We selected one key result for each of the four outcome groups we considered most relevant to the intervention's theory of change. Our key results include: i) Female respondents report that her household will have same or more access and control over the plot in five years, ii) Household has taken out a loan from a bank since 2009, iii) Share of household land over which female respondent decides "How to use the plot", iv) Household used seedlings, seeds, or grafted stems in last year. In the case of "tenure security" we focused on the women's belief around having access to the plot in the upcoming years. For "women's participation in decision making" we highlighted the woman's ability to make planting decisions. Regarding the "use of credit for agriculture production" we chose an indicator of household level access to agriculture credit. For "investments in agriculture production" we chose our measure based on the original researchers' footnote 14, where they explain this indicator quantifies the likelihood of households "to undertake new plantings and/or annual crops, rather than only caring for already existing trees and perennials" (p. 871).
The original authors present these results in Table 3 of their paper. For each of these key results we will assess major and minor differences in p-values. We will consider parameter differences as major or minor based on 15 percent and 30 percent changes as indicated in the Table 1.

Pure replication
In the second stage of our replication study we conduct a pure replication analysis. Following Wood and Dong (2018), we will recode the analysis to reproduce the original paper using the original data and following the methodology presented in the paper. We then analyze any differences between the original code and the recreated code. We will focus on identifying differences in the management of outliers and data imputation. We will closely examine the inverse propensity score regression technique used in the original analysis. We plan to follow King and Zeng (2006) to test the common support assumption and Austin (2011) to examine the balance between the treatment and control households.

Measurement and Estimation Analysis (MEA)
We will focus our MEA on the validity of the original research assumptions, data transformations, and estimation methods. Within research assumptions, we will include attritor households in the analysis that did not have females in the household at the time of follow-up. When looking at data transformations, we will examine different hunger vulnerability measurements and convert children into adult equivalency units. When considering estimation methods, we will develop a intervention implementation timeline, assess an annual analysis of the intervention, and conduct a difference-in-difference and a treatment on the treated analysis.

Research assumptions: attrition
The original authors report a fairly large 25 percent attrition rate between the baseline and follow-up survey. If information was collected on households without women present at endline, we plan to re-examine the original authors' attrition analysis. We will test the robustness of the intermediate and impact level results to the inclusions of these attriters.

Data transformations: outliers, data imputation and variable construction
In our experience, published articles provide little documentation around outlier identification and missing data imputation. We will examine the robustness of the results to alternative data transformation decisions. If we identify missing values or outliers in the original analysis, we will explore the inclusion of dropped observations and alternative imputation strategies. As this paper focuses on household-level hunger, we plan to test the robustness of the food security impacts to an alternative measurement of hunger vulnerability. The original researchers use a binary proxy indicator, assuming hungry households experienced times within the last three months when they did not have food and/or money to purchase food. Due to the length of the recall period and the stark contrast between the two options, we will use categorical variables to capture greater variation within household-level hunger vulnerability.
In addition to general household hunger vulnerability, the original authors explore heterogeneous hunger impacts based on gender and age. Santos et al. place people in three categories: adults who are 12 years and older, children between the ages of 4 and 11, and infants aged 0-3. Without a clear explanation for creating the adult threshold at age 12, we will use standard adult equivalency units to convert younger people into adults and test the robustness of the overall hunger results. We plan to look at alternative ways to account for age using adult equivalency units (Swindale and Ohri-Vachaspati 1999). We will use adult equivalency units similar to how Frongillo and Nanama (2006) measure food insecurity.

Estimation methods: type and level of analysis
We will develop an intervention and survey timeline. We plan to clarify with the original authors when the treatment households knew they had access to the land and when the baseline data were collected. The original authors note that land distribution and the actual moving to these lands were oftentimes delayed. The original publication also mentions two rounds of baseline data collection. We want to account for both of these differences in our analyses.
Our annual analysis will assess the intervention effects for households that had longer periods of land access. This analysis will account for the variation in the baseline data collection and the amount of time households lived on the land. Although the researchers partially account for if the households lived on the land in Table 4, their analysis is binary. We propose accounting for both the different time periods of the baseline and the amount of time households lived on the land by a) creating a dummy variable for 2010 baseline and b) including a continuous variable to measure the length of time households lived on the land.
We further plan to test the robustness of the intermediate results to a difference-indifference analysis. The original authors appear to use the baseline data to conduct their matching exercise, and then compare the differences between the treatment and control households in the follow-up period. Given the existence of baseline data, assuming that similar questions were asked in the baseline survey instrument, we will check for shifts in the outcomes of interest based on household inclusion in the treatment group.
Santos et al. find a number of their results statistically insignificant at the intent to treat level. As only around 25 percent of the treatment households actually relocated to their new microplot at the time of follow-up, and it is unclear how many other treatment households actually received land, we will calculate treatment on the treated (TOT) estimates. The original authors conduct these estimates for the food security outcomes. We will extend their analyses to the intermediate outcomes.

Theory of Change Analysis
In our theory of change analysis we propose to look at three topics that fall within the heterogeneous impacts category (Brown and Wood, 2018). First, we will examine alternative outcomes of interest, specifically with respect to wealth outcomes. Next, we plan to test the effectiveness of the intervention within the subsample of single, divorced, or widowed women. Finally, focusing more on the policy relevance of these results, we will explore intermediate outcomes above and below 5 and 10 decimals of land.

Heterogeneous impacts: outcomes of interest and sub-samples
The original publication tests a number of wealth measures. The researchers note that they expect to see increases in land investment, possibly through access to credit. Increased agricultural related income from land ownership represents another potential channel of increased investment. The authors state that the data collection instrument includes information about income generation activities, expenditures, and debt. To test for another channel to increase agricultural investments, we will use data on income generation and expenditures. We will also check savings or housing conditions to test for wealth increases due to the intervention. We will then test the alternative hypothesis that the intervention mainly influences female headed households. Based on the summary statistics, 17 percent of the treatment households are composed of single, divorced, or widowed women. If sample limitations prevent us from conducting a full heterogeneous impact analysis, we will look for suggestive correlations. Finally, minimum land size is a fundamental question for this type of intervention. In the middle of the project the Government of West Bengal capped land distribution at 5 decimals per household. The original researchers report a correlation between larger microplot allocations and improved food security outcomes. We will add to their analysis by focusing specifically on households that received 5 or less decimals of land and 10 or less decimals of land.

Discussion: interpreting replication's results
Providing landless households with microplots is touted as an economically feasible and politically appealing global intervention. In Bangladesh, Hillenbrand and Waid (2014) discuss the importance of microplots in increasing micronutrient levels and general household food security. More generally, the United Nations Development Programme (UNDP) recommends microplots as a method to reduce inequality in the developing world (UNDP, 2013).
To interpret the findings of our replication study, we focus on the robustness of the intermediate results to our MEA. The original authors use these results to demonstrate the policy relevance of providing microplots to landless households. On one hand, the paper's results would be weakened if we show that they are not robust to our difference-in-difference analysis or we demonstrate through our microplot threshold analysis that the intervention is economically infeasible. On the other hand, the results would be strengthened if our annual analysis shows stronger effects for households that accessed the microplots for longer time or if our TOT demonstrates that the intermediate results are stronger for households that received land. The case for microplots would also be strengthened if our MEA determines that the program improved food security by examining minimum land thresholds or including alternative measurement of hunger vulnerability. In terms of quantifying the concepts of strengthened and weakened results, we will generally compare the publication to our replication results, looking for similar coefficient sizes, direction of coefficients, and statistical significance. We will take into account the potential for lower levels of power when discussing our sub-sample analysis.
We think of replication studies as tools for broader discussions of influential and innovative research. Therefore, our interpretation of the results of the replication process does not focus on determining a "success" or "failure" to replicate the original study, but instead aims to deepen the research dialogue. We will detail our attempt to independently verify these policy relevant results, and we will then invite the development community as a whole to review, assess, and comment on the original publication and the replication report. We hope our approach contributes to the use of replication research as a tool to encourage conversations about what development programs work and what interventions looks promising for scaling.