NOTE: The views expressed here belong to the individual contributors and not to Princeton University or the Woodrow Wilson School of Public and International Affairs.

Friday, November 11, 2011

Vying with Velker: RCTs reconsidered

Shawn Powers, MPA ’11

In “Randomized Controlled Trials on Trial,” Jake Velker proposes several reasons to be skeptical of randomized controlled trials (RCTs) as a method of program evaluation. While Jake makes some good points, as a “randomista” I think the picture he paints of the RCT movement is far too pessimistic. I will consider each of his arguments in turn.

1. External validity
Jake first takes up the critique, championed by Princeton’s Angus Deaton, that RCTs suffer from external validity problems—in other words, the results of an evaluation may not generalize well to other contexts. While the criticism is frequently leveled at RCTs, the question of external validity applies to all empirical work. In general, I find these discussions about the differences between Deaton and RCT proponents a bit overblown. Perhaps this is because everyone loves a good spat between high-profile intellectuals (see also: Sachs and Easterly). In reality, according to the profile of Esther Duflo in The New Yorker that Jake references, Deaton has described his criticisms “as more in the form of an amicus brief than an attack." The arguments raised by Deaton and others are having an impact, as Jake acknowledges, and RCTs increasingly are testing rich behavioral hypotheses. The incentives of academic publication are also moving RCTs toward greater theoretical sophistication. Gone are the days when a randomized design was a novel enough identification strategy that it could propel a study to publication in a top economics journal.

Considerations of theory aside, whether or not a particular result generalizes is itself a testable, empirical question. We cannot—and should not—test everything everywhere, but if a particular approach proves effective in multiple contexts, our confidence in its “generalizability” should increase accordingly. Replication studies can also test variations in the length or intensity of treatment, disentangle the impact of different components of a program, or test how a small-scale intervention performs as it is scaled up.

If the goal is to achieve certainty that intervention X will achieve result Y in context Z, we will never achieve it, with RCTs or any other method. However, considering evidence from even one rigorous evaluation is a big improvement over flying blind. As we consider multiple evaluations, together with insights from theory and other empirical work, the picture becomes that much clearer.

2. Institutional constraints
Jake’s main criticism is that economists conducting RCTs “have been accused of ignoring the institutional constraints against which their interventions would inevitably contend if scaled up.” He cites corrupt bureaucracies, weak institutions, a lack of (or perverse) performance incentives, and budgetary problems as barriers to successful replication of programs found to be effective. The underlying message seems to be that if the RCT movement wants to influence policy successfully, it cannot just publish research findings and hope for the best.

I could not agree more with this last point, but the RCT community is much farther along on this front than Jake suggests. Both J-PAL and our sister organization, Innovations for Poverty Action (IPA), have policy staff dedicated to bringing research findings to bear on the often-messy world of policymaking. While our academic affiliates are involved with policy outreach, non-academic policy staff help extend the reach of their research findings. This process is never easy and not always successful, for all the reasons Jake mentions, but we have found that it is possible to improve policy even in very constrained environments.

As an aside, Jake also suggests that governance in developing countries is not amenable to quantitative study. I would have thought the same before I started with J-PAL, but in fact, J-PAL affiliates currently have at least 42 completed or ongoing evaluations in political economy and governance, including many that address precisely the issue he raises of the incentives of government officials and service providers.

3. Do we already know what works?
Finally, Jake entertains the idea that perhaps we already know what works, since “many of the most celebrated finds of the RCT movement are relative ‘no-brainers’.” I find this argument troubling for two reasons. First, we have been following our intuition about what works in development for decades, with not a lot to show for it. What we have seen is succession of fads, with decidedly mixed results in terms of reducing poverty. There was a time when infrastructure was the “no brainer,” later it was basic needs, still later the focus turned to sustainable development, and today infrastructure seems back in vogue. To suggest that we already know what to do invites just this kind of intellectual drift.

Second, while it may be true that many RCTs report seemingly obvious findings, some of them surprise us—and we never know in advance which those will be. For example, a number of NGOs and opinion leaders have championed the idea that distributing sanitary products to adolescent girls will remove a barrier to female education. The underlying common-sense assumption is that menstruation causes many missed days of school. However, a randomized evaluation of a program that distributed an easy-to-use sanitary product in Nepal found no significant effect on school attendance (although the girls used, and liked, the product). As always, we should avoid over-generalizing from one study, but at minimum these findings suggest that proponents of this approach should adjust their expectations about what it can deliver. In other cases, RCTs have contributed clear evidence to debates where both camps have common-sense arguments on their side, such as the vexed issue of whether, and how much, to charge poor people for basic health and education products and services. Finally, even if the qualitative findings of an RCT seem to confirm common sense, policymakers may still want to know how an intervention stacks up quantitatively against other interventions with the same goal, in terms of both raw impact and cost-effectiveness.

RCTs are no more a panacea for development than anything that came before them. But as long as there is more ideology and wishful thinking in development policymaking than evidence, I believe that the continuing growth of the RCT movement is a welcome trend.

Shawn Powers is a Policy Manager at the Abdul Latif Jameel Poverty Action Lab (J-PAL). The opinions expressed here are his own.

No comments:

Post a Comment