By Robert On & Linus Wong
A/B testing is everywhere in the private sector. It’s how tech companies build and refine great products, and what digital marketers do every day: show version A to some users, version B to others, and see what performs better. It happens constantly on websites, apps, and social media. But in the social sector – among nonprofits, NGOs, and government programs – it remains rare.
This is odd, because a major recent trend in the social sector has been the rise of randomized controlled trials (RCTs) – the gold standard for measuring impact in social programs – which are based on the same statistical methods as A/B tests. But RCTs are generally geared toward answering a point-in-time question of whether something works, and they often take years to complete. The impact evaluation machine is not set up to tell you how to improve a program, much less how to improve things tomorrow or next quarter.
A/B testing, by contrast, helps organizations rapidly test ideas to figure out what works, enabling continuous learning and improved impacts over time. Yet many organizations assume A/B testing is out of reach – that it requires expensive tools or a full data team. In fact, any organization can do it. The key is realizing that A/B testing exists on a spectrum, from quick-and-dirty spreadsheets to fully automated testing platforms. Often, it’s simply the next step from their current evaluation and learning efforts, like tracking metrics and key performance indicators (KPIs) or running qualitative research in the field or lab.
A key priority for The Agency Fund is to help high-impact social sector organizations scale by promoting learning and experimentation. This blog focuses on one aspect of that work – our support for A/B testing, with a spotlight on our partnership with Rocket Learning.
Rocket Learning’s Testing Journey
Rocket Learning is a tech-forward nonprofit social venture in India with a mission to improve early childhood development for millions of pre-school children, working at scale through a network of “Anganwadi” centers (the largest daycare program on earth) alongside direct parental engagement. Primarily operating through WhatsApp, they have launched more than 100,000 "digital classrooms" since 2020. These small groups of parents and Anganwadi workers receive simple, accessible, bite-sized educational content, learning activities, feedback, and encouragement – helping around 4 million low-income Indian children access quality early childhood education.
Rocket Learning had already run several A/B tests by the time we started working with them in 2021; this was grounded in the team’s strong culture and commitment to improving metrics. They had tested everything from the timing of WhatsApp message delivery, to the persona delivering messages and the avatar displayed in multimedia content. Their five co-founders all came from professional backgrounds that prized evidence-based programming and learning, and they already had a CTO among them. Their initial A/B tests were relatively small: about 200 users per treatment arm, comparing simple message variations and measuring any changes in user engagement. (In the Rocket Learning context, engagement involves users posting photos showing that they’ve completed a worksheet or activity – more meaningful actions than simply liking a post.) These tests weren’t sophisticated, but they got the team into the habit of testing and learning from the very beginning.
Through a fellowship with the Agency Fund, Rocket Learning deepened the kinds of questions they asked using A/B tests. Broadening the scope of their tests beyond WhatsApp engagement, they began to explore how different message framing might shape specific behaviors. Noam Angrist, Executive Director of the Botswana nonprofit Youth Impact, supported this evolution by encouraging the team to run larger-scale experiments that could detect smaller effects. One such experiment tested a variety of message types to see which message designs helped parents spend more time cognitively engaged with their children.
Though each trial may have boosted user engagement by just a few percentage points, the gains have compounded over time. This year, Rocket Learning’s user engagement has reached approximately 40% monthly active users – meaning that 40% of parents onboarded to their product interacted at least once over WhatsApp in the past month – and continues to grow.
As the team’s learning ambitions have grown, so has their technical infrastructure. They have moved from manually assembling data and writing long database queries – which can increase errors – to now building tools that let product managers automatically define A/B tests, select outcomes, and analyze results with a few clicks. Some parts of the process can't be automated – like creating new features to be tested. But the idea is to free engineers and analysts from grunt work, so they can focus on better design and deeper insights.
This is where The Agency Fund has played a role: our engineers and product team partnered with Rocket Learning to design an A/B testing platform to help automate the experimentation process and scale with their needs. With Rocket Learning's support, we've been working to generalize it for use by other nonprofits. This system – which we call Evidential 1and is currently in beta testing – works with the data and outcomes organizations already collect about their users. When designing an experiment, managers set the criteria for targeting the right users, the user metric they wish to affect, and the product variations they want to evaluate. Evidential then pulls the data from their database and executes the randomized assignment and statistical analyses. It can also serve as a registry for hypotheses and results.
The payoff for organizations could be big: faster cycles of experimentation, greater learning, and more effective personalization for end-users. Rocket Learning already runs more than 20 A/B tests per year – but they hope to double that rate with automation. Now they are looking to the next phase in this journey: moving beyond increased user engagement and testing for changes in conventionally “offline” metrics, like children’s educational outcomes.
Designing A/B tests that try to move children’s learning directly, at scale, is challenging – largely due to the lack of cost-effective and standardized measurement tools. Exploring this challenge has also been part of our partnership. A recent study by Rocket Learning's Bilal Afroz with The Agency Fund's James Walsh and coauthors Aarti Malik and Ronak Jain found that a phone-based caregiver survey – adapted from the World Bank’s Anchor Items for the Measurement of Early Childhood Development (AIM-ECD) framework – provides a reliable and low-cost alternative for measuring child development outcomes. Shifting to a phone-based survey introduces minimal bias from parent and teacher respondents, and can be validated against direct in-person assessments of child learning.
Getting Started on the Testing Journey
Rocket’s story is inspiring, but many nonprofits don’t have a CTO, a data science team, or any experience with A/B testing.
The good news is that every organization can find its place on the spectrum: it doesn’t take a sophisticated tech stack or a dedicated engineering team to get started. Some organizations might find their pathway into A/B testing through qualitative work, exploratory data analyses or tracking operational dashboards. These activities often lead to formulating hypotheses that can be A/B tested.
Many nonprofits can get started with simple questions and basic data, without having every step automated. A few of The Agency Fund's other partners illustrate this well:
Noora Health – a nonprofit that partners with government healthcare systems in South Asia to deliver evidence-based education to patients and caregivers – is ramping up its use of experimentation to improve its Whatsapp-based information service. This process has clarified their user funnel and engagement metrics, strengthened their existing data systems, and led to increased sign-up rates.
Shamiri Institute – a Kenya-based nonprofit that provides scalable, evidence-based mental health interventions for high school youth – also started with manual testing. One recent A/B test compared two strategies for referring youth to one-on-one counseling. Another tested whether increasing group-counseling size from 10 to 15 students would maintain impact. Shamiri’s programs are delivered in-person, rather than digitally, but they used data they already had and randomized using general purpose analysis tools, not any specialized experimentation software. The result: cost-effective improvements and a stronger internal learning culture.
Youth Impact – a Botswana-based nonprofit that delivers low-cost, evidence-based education and health programs globally – began with a large-scale RCT taking years to yield results. To enable faster changes for impact and scale at low cost, they evolved their Monitoring & Evaluation efforts to support running ongoing, iterative A/B tests. Field teams surface some of the most promising ideas, which go on to be rigorously tested. This rapid learning capability allowed them to adapt an existing classroom intervention into ConnectEd, a low-tech remote instruction program shown to increase numeracy by 90%. Today, Youth Impact regularly works to enhance cost-effectiveness by running an A/B test every school term for each of its programs.
Kabakoo Academies – a Mali-based organization blending high-tech innovation with local knowledge – has made experimentation central to its culture, even introducing new staff to A/B testing theory during onboarding. Ahead of their flagship event Bamako.ai, for example, the team tested four message variants as part of a large-scale WhatsApp campaign, using feedback from the tests to pivot strategies mid-campaign and ultimately boost attendance at the event.
Saajha – a Delhi-based nonprofit connecting level-appropriate content to parents to engage their children in learning. Committed to measuring educational outcomes cost-effectively at scale, Saajha has experimented with many methods to reduce the cost and improve the accuracy of these educational assessments. In collaboration with academic partners, they’ve run over 10 small and large-scale A/B tests with about 15,000 parents to iteratively improve their operational efficiency.
Several other organizations we work with have similar success stories or are starting their own A/B testing journeys. The examples show that A/B testing isn’t just for tech nonprofits – it’s for anyone willing to ask, "What’s a better way to do this?"
Bringing a Culture of Experimentation to Any Organization
Despite its potential, A/B testing remains underutilized in the social sector. But with the right mindset and support, any organization can use experimentation to drive smarter, faster improvements.
Every organization, regardless of size or technical capacity, can run A/B tests. You don’t need a data warehouse or software engineers to start. Ultimately, it starts with a mindset: A/B testing is about treating every program as an opportunity to test, learn, and improve – on a continuous basis and often in small ways, as opposed to conducting one major evaluation every few years. Often, A/B testing is the natural next step to your current learning efforts.
Start small, with simple data and free tools. You don’t need expensive tools or fancy infrastructure to run a trial – though you do need a reasonably large sample of users for testing, typically at least 1,000 people but potentially as few as 100 to 200. Start by randomizing users to version A or version B, which can be done with a simple spreadsheet. Then test different message scripts or delivery times, tracking who reacts or responds. If your program is in person, track attendance; if it’s digital, you can track clicks and visit counts – or whatever output data you already collect. With just a bit of structure and intention, even these small tests can lead to useful insights.
Understand that A/B testing exists on a spectrum, and developing a learning culture is a long-term journey. For some organizations, all they ever need are simple, manual processes and basic comparisons. Others will build full-stack experimentation platforms as they scale, with automated targeting, delivery, and analysis. Wherever you fall on this spectrum, there is a viable and valuable way to begin testing and learning. And don’t forget to incorporate qualitative research into your A/B testing practice – for example, by interviewing users after an experiment to learn why they preferred one version of the product over another.
Use A/B tests continuously to measure the impact of your best ideas and refine them. Routine testing of new ideas helps you get better at program and product management. With every experiment you run, you deepen your intuition and understanding of users. You start to develop the muscle of iterative brainstorming, testing, and results interpretation. Over time, you get to see how your ideas improve the product.
Continuous learning supports long-term effectiveness, reduces risk, and is often attractive to funders. A/B tests help organizations avoid the risk of implementing new or modified interventions that could reduce effectiveness over time – or unintentionally cause harm. By allowing organizations to trial changes on a small scale and generate clear evidence before rolling them out at scale, A/B testing ensures that programs evolve based on what actually works – not just what seems like a good idea. These are features that many funders look for in prospective grantees and partners.
Ultimately, A/B testing is less about technology and more about cultivating a culture of inquiry and continuous improvement. It’s about formulating clear hypotheses, using data to evaluate them, and making informed decisions based on evidence. Whether you’re delivering services through a digital platform or implementing in-person programs, A/B testing offers a practical and scalable way to experiment and learn what works – and to become measurably more impactful with each passing day.
Edited by Greg Larson