Doing by Learning
Q&A with Dean Karlan on evidence, technology, and the RCT movement’s next frontiers
In economics and policy, agency is a fairly unorthodox idea. Standard economic models – the kind that policy graduates cut their teeth on – say nothing about it. Economics students are still taught frameworks established in the 1950s, in which humans hold godlike computational capabilities yet have very basic and unchanging goals. Behavioral economics in the 1970s and 80s began to challenge these ideas, but focused too much on how human psychology leads us to make mistakes. Agency, on the other hand, is about cognitive emancipation. It asks how can psychological science be leveraged to help people flourish? And it is essentially absent in policy schools.
Why, then, is The Agency Fund so hopeful about the agency agenda? The answer is simple: evidence. The past decade has seen a cognitive revolution with the integration of ideas from social psychology, cultural anthropology, and cognitive science, moving beyond the rational-actor model and its critiques to create a radically new paradigm. We are still in the early stages of this revolution, but its momentum is predicated on a growing and compelling evidence base. And not just any evidence, but one type in particular: the randomized controlled trial (RCT).
RCTs are powerful because they combine gold-standard rigor with story-like parsimony. While used in the social sciences for about a century, they only recently became common in economic development. But their findings across a range of sectors have transformed the field in terms of impact and influence, converting good intentions into good outcomes for people living in contexts of severe disadvantage. A notable example is the cash transfer: a common myth is that money given to poor people will be wasted on things like alcohol and gambling, but RCTs have shown this isn’t true. Such nuggets of evidence can transform our assumptions and policy approaches – and recent RCTs have started shedding light on the potential of agency-based approaches in development.
The power of RCTs lies not in their statistical sophistication. In fact, it’s the opposite. Well-designed RCTs enable researchers to avoid complicated statistics and generate straightforward evidence. But they require other capabilities: advanced research design. Few organizations in the world are better at this than Innovations for Poverty Action (IPA), a global research and policy nonprofit that has implemented hundreds of RCTs.
That’s why we were so delighted to speak to IPA’s founder, Dean Karlan – Professor of Economics and Finance at Northwestern University, co-Director of the Global Poverty Research Lab (GPRL), Chief Economist at USAID until recently, and one of the development RCT movement’s early pioneers. In a recent interview, Karlan reflected on the movement’s legacy, the current landscape for evidence-based policymaking, the place for technologies like AI, and what we still need to learn about human agency’s potential in the field.
~~~
How did you become interested in rigorous evidence?
After college, while working for a microcredit organization in El Salvador, I was struck by the plethora of design decisions baked into their credit and savings products. But whenever I asked, “Why this way? Why not that way?” the answers were never based on evidence or data. I saw that as a big, obvious gap. And this includes even the broadest question for them: is this working?
A few years later, as an economics PhD at MIT, I reconnected with the microlender about exploring some of those questions – so I talked with Michael Kremer, who was teaching at MIT then, about what an RCT could look like in El Salvador. When I tried to change the topic to my dissertation, Michael said, “Wait, isn’t that what we’ve been talking about?” I thought my dissertation needed fancier econometrics than an RCT. But he said “No, you use the right tool for the right question. These are good questions, and that is the right tool.”
Those were very meaningful words to me then: I was getting a PhD because I wanted to gain a deeper understanding about the causes of poverty and pathways out of it, in order to inform policy – so what I choose to work on should be dictated by where those knowledge gaps are and how I can best address them, not by whether something is a technical breakthrough.
How did your early experiences inspire IPA?
Not long after that, I spent time in India helping two of my advisors, Esther Duflo and Abhijit Banerjee, on their first RCT. But then I moved on to do my own research and they carried on with another grad student, then other grad students after that. This process was putting a lot at risk. We were losing continuity, and typical PhD students are not necessarily the optimal choice — indeed far from it – for managing people, projects, and partnerships. And there was a scale issue, if relying just on graduate students to manage field projects.
Setting up an institution to balance all those incentives seemed really important. So, the summer of 2002, right when I was graduating from my PhD and becoming a professor, I drafted a two-page vision statement and pitched it to Esther, Abhijit, and Sendhil Mullainathan – who all agreed to be on the Founding Board and help me get this started. I initially called this Development Innovations, but when Poverty Action Lab at MIT started in 2003 we changed the name to Innovations for Poverty Action, in order to signal our intention for the two institutions to collaborate deeply, as sister organizations.
What factors do you think have been important in making the RCT movement a success?
First of all, there was just a gaping need: we were tackling questions that a lot of people were asking. RCTs also have a certain clarity, simplicity, and accessibility – and they resonate with things we're already used to. Long ago, we all accepted the fact that we don’t want to take drugs that haven’t been tested. So there was a comfort in applying that level of scientific rigor to a lot of the questions people were asking. Not all questions, of course; a lot of questions are outside the space of what these approaches can tackle.
This type of research is also just a lot more possible today – the internet really lowered the transaction costs, and that’s a big part of the RCT movement’s success. To do this type of nitty, gritty, detailed data work 30 years ago, you basically had to live in the country for a year and collect your own data. You couldn't just fly in for a week or two, then follow up on Zoom with a team of research assistants in the field.
RCTs are sometimes criticized for taking too long to be published, especially compared to approaches like A/B testing. Do you think that’s a fair criticism?
We’ve seen tremendous innovations in this space, including more rapid-fire experimentation using data sources that are more prevalent now, like social media and administrative data. But there's one thing we haven't innovated yet: nobody has invented a time machine.
If you want to know the 5- or 10-year impacts on something, you still have to wait 5 or 10 years. So I don’t see RCTs as slowing down the learning process; even if you're doing a quasi-experimental study, you have to wait. Obviously, sometimes research can look backwards and learn a tremendous amount; indeed, more research does that than prospective, even today after the RCT movement has flourished – for example, Esther Duflo’s paper on the long-run impacts from a school construction program in Indonesia. But if you want to do anything prospective, you either need to be patient or get a time machine.
We do need more work on the short-run proxies that might act as “surrogates.” But you need patience for that, too: to validate the surrogate, you need to see whether it predicts the long-run impact of a program. The even bigger challenge, I think, is that to validate a surrogate, you need the program to work better for some than for others, and you need to be able to detect this in the data. That way you can see if, in the short-run, the people for whom the surrogate outcome improves are also the people that experience the biggest impact on the long run. We often search for this variation, to understand who something works better or worse for, but we just don't find it. And the world's a complicated place, so there’s never just one thing that drives such heterogeneity. Competing forces makes this a messy venture.
What about the long path from publication to policy adoption, which can span decades – how can rigorous evidence better inform policy?
We need research to speak more tightly to the actual decisions that policymakers are making, but there are a couple tensions. First, in general we focus too much on individual papers. There’s a sense that if we get a study with a positive effect, we should all of a sudden scale that study. We need a system with more incentives for replication, iteration, and meta-analysis –more second- and third-generation research building on other papers and teasing out issues. Cash transfers are the poster child for this type of approach; with my colleagues, Northwestern’s Global Poverty Research Lab and IPA recently posted a Bayesian meta-analysis with 114 papers of RCTs of 72 unconditional cash transfer programs. Some of those individual papers are highly influential, but I think the collection should really be what’s used to move policy.
Another tension is that policymakers often respond to evidence by saying “But it's different here.” That’s actually the right claim to make, but it needs an extension: “It’s different here because X, Y, and Z.” Every place is unique, but how is it different and why? Do those differences matter for whether something works, and how do they matter? Because we’ll never have a perfect understanding of the world.
Every issue is a bit like a Picasso portrait: there’s the structure of a face, but the eyes are funky and the nose is off in the corner. It's clearer than Jackson Pollock, but it’s not a photograph. Evidence provides you that level of guidance – to help you start filling in the holes until you’re comfortable that you're on the right path.
Of course, not all policy makers will actually sit down and read meta-analyses. So we need to tee up the information at the right time and in a helpful format, aligned with existing decision-making processes. We need to follow the nudge philosophy, which Richard Thaler said boils down to one phrase: “make it easy.” But how do you strike that balance? Rather than saying, “Here's our research, it should be replicated” – which isn’t helpful and lacks humility about the ability of any single piece of research to inform policy – I think we need to say things like: “Here's what we've learned, and if you want to apply it, here’s a list of things you need to figure out in your context.”
In other words, a good piece of quantitative research won’t tell you everything you need to know. But if it's done well, it helps you understand the questions you need to ask. You need other sources of information, too – like qualitative research, descriptive work, needs assessments, and institutional assessment. RCT evidence is an important piece of information, but it’s not sufficient for forming policy.
Finally, if there’s no evidence out there to guide a policy decision in a given country, that means there’s a knowledge gap. We had a slogan at the Office of the Chief Economist at USAID: “use it or produce it.” The “it” referred to cost-effectiveness evidence. Because if you're asking the question, others are too – and you’ll definitely be asking the same question in five years if you don't go find out what works and at what cost.
How do you see the advent of frontier technologies like AI changing the way that research and learning happens in development?
Well, one of the ways to facilitate more replication and meta-analysis is to make it easier for people to do it. One example GPRL has been working on, in collaboration with the World Bank, AidGrade, and CEGA at Berkeley, is a database of RCT results – not just RCTs, but their actual point estimates, standard errors, sample sizes, and the like. The goal is to get enough details to be able to do some quantitative analysis, though not enough that you can just hit go and write a paper. I think that's a bit dangerous: the few variables in the database can’t capture everything, and certain factors can’t be easily quantified. So you still need to read the papers, understand their nuances and subtleties, and understand how they differ.
We might use AI to help identify papers or maybe even scrape them, but it will likely be a hybrid process – we’ll have humans check and validate that it pulled the right variables. More broadly, it makes me nervous to imagine policy makers or donors thinking they can just type into Google “What is the most cost-effective way to fight poverty?” and then start moving money towards that answer. I would love to be wrong, but I'm highly skeptical that we're anywhere near the point at which an AI system can do the necessary synthesis of evidence, given all the nuance. And I’m a little scared that those systems will get built and have some allure for people, even if they aren’t being validated.
How has your work engaged with the concept of “agency,” and what knowledge gaps do you think are most pressing in this space?
My thoughts on this go back to heterogeneity. We know that a lot of development programs focus on giving people opportunities – by providing things like money, information, and training – but we also know that there's tremendous heterogeneity in what people do with these opportunities. Do they even seize or deploy the opportunity at all? If they do, what do they get out of it?
But I see two challenging questions. The first is ultimately about measurement: what do we mean by agency, and how can we properly measure it? If we get those measures, we can begin to understand heterogeneity – then we're onto something. It wouldn’t give us policy prescriptions, but it would help us begin to identify policies that enhance people’s agency. Then there’s the second question: how do we change agency? If lack of agency is why people aren’t taking those resources or information and building a more sustainable livelihood for themselves, how do you shift their agency so they can seize those opportunities?
Right now, I think we’re only at the tip of these questions. We have some exciting findings and some interventions that seem to increase people’s ability to seize these types of opportunities. But we really do not have the sort of comprehensive answers we’d need to set policy at scale – and unfortunately, the lessons are unlikely to be simple or straightforward. By its very construction, agency will manifest differently in different places, for different people, for different contexts. It will need constant tailoring.
Yet in a way, that's a luxurious problem to have. It means we have enough success to be asking how to scale, how to tailor.
Introduction by James Walsh. Interview and editing by Greg Larson.
This blog is part of a series on Leveraging Personal Agency for International Development, a convening co-organized by The Agency Fund and the SEE Change Initiative at Johns Hopkins University, where Karlan was a speaker.