Agency Fund Accelerator: Year One Update
Highlights and lessons learned from the first cohort
Generative AI is quickly becoming part of social programs, enabling nonprofits and governments to improve outcomes at lower cost. A key question is how to deploy these tools responsibly and rigorously. How do we ensure AI strengthens systems, rather than duplicating or fragmenting them, especially in resource-constrained settings?
In January 2025, we partnered with OpenAI and the Center for Global Development (CGD) to launch the Agency Fund Accelerator: a year-long cohort program for nonprofits building AI tools in public service delivery systems. We selected eight organizations across health, agriculture, and education to design, test, and refine AI-enabled tools in live programs. They worked with embedded technical teams from The Agency Fund, met for quarterly sprints, and used a growing suite of AI tools for the social sector, including the open-source experiment engine Evidential and the Playbook for AI Evaluation.
Earlier this year, the 8 organizations demoed their products at a Demo Day held alongside the India AI Impact Summit in Delhi. Their work points to a broader lesson: NGOs are critical for advancing the jagged frontier of AI, exposing where the technology still requires development and building novel tools to solve specific challenges.
Updates from the 2025 cohort
Here are some moments from Demo Day that really updated our priors:
Rocket Learning (India) deployed Shiksha Saathi, a WhatsApp-based chatbot supporting Anganwadi workers – frontline early childhood educators in India’s public daycare system. It offers micro-training, activity planning, and real-time guidance to busy teachers. The bot can customize content to the classroom’s age range, student needs, and local cultural norms. Pilots are demonstrating strong engagement and opt-in rates. Teachers can also send in voice notes with feedback on individual students, which triggers the creation of updates for parents on their children’s development. The team has partnered with OpenAI, the Maharashtra government, and other government partners to reach more than 400,000 Anganwadi workers.
Digital Green (Kenya, India, Ethiopia) has developed FarmerChat, a multilingual AI assistant that allows farmers to ask questions using voice, text, or photos and receive localized, practical farming guidance in their own languages. Designed for inclusiveness, FarmerChat supports farmers in multiple low-resource languages and accommodates a range of digital literacy and data access levels. The platform has surpassed 1.1 million installs globally, with 70% of active users reporting that they adopted a recommended practice. Digital Green shared that users often test FarmerChat by asking questions they already know the answer to; they fact-check before trusting the system. It’s a subtle behavior, but a powerful reminder that trust in AI is built gradually.
Youth Impact (Botswana & India) embedded AI within ConnectEd, its low-tech phone tutoring program for primary school youth. They added voice-based AI assessments of students’ math ability to save teacher time and enable more students to benefit from the program. The tool can also provide feedback to teachers on their tutoring practices. The team has found that an ASR-LLM-TTS pipeline (combining automatic speech recognition, a large language model, and text-to-speech) is more cost-effective and higher-performing than realtime voice interaction. However, speech recognition models still perform poorly with children’s speech – prompting Youth Impact to explore fine-tuning. The tool is currently reaching more than 75,000 learners in the Indian state of Karnataka, with plans to scale beyond 1 million this year.
Reach Digital Health (South Africa) built a conversational onboarding flow and survey service for MomConnect, the national maternal and child health messaging platform in South Africa. Launched as part of a randomized evaluation, Reach used AI to personalise survey questions. Interestingly, the AI-powered surveys decreased moms’ response rates, compared with a static questionnaire delivered via SMS. At the same time, the conversational format did increase inbound messages from mothers by up to 42%. Conversational surveying seems to introduce friction, but Reach continues to experiment. At Demo Day they showcased a new AI voice platform designed to answer mothers’ questions in real time. After the live demo, a man in the audience asked, “I couldn’t help but notice that you, a man, were talking to the AI and told it you were pregnant, and it didn’t correct you. How are you going to fix this?” Milton from Reach responded, “Does that matter?” – a reminder that beyond biases in models, we must also guard against our inherent human biases.
Jacaranda Health (Kenya) built an AI-powered conversational service that can answer questions from pregnant women and new mothers in Swahili and Sheng (a dialect). Operating via low-tech phones, the voice service enables low-literacy Kenyan women to call an AI help desk and ask any maternal or newborn health question. They receive an automated callback within minutes, offering accurate, contextual, and timely responses to their specific queries. The team fine-tuned a Swahili ASR and TTS stack, reducing word error rates from 89% to 15%. The system is now integrated into PROMPTS, an SMS messaging service for women, with response times averaging about two minutes. The main driver of latency? It’s not the AI model! A major contributor to delays is with the telecom operator. The team is now optimizing to reduce latency.
Precision Development (India) built PaddyAI, an AI assistant that helps program teams generate and deliver customized coffee farming advice at scale, using retrieval-augmented generation (RAG), drawing on expert knowledge bases and farmer data. The assistant operates in the local language with delivery of messages via voice. The prototype is live with smallholder coffee farmers in the state of Karnataka. A/B tests currently underway will compare LLM-generated messages to human-generated content: between AI campaigns and expert broadcasts, which produces stronger farmer engagement, comprehension, and field management? Next steps for PaddyAI include expanding crops, languages, and geographies.
Udhyam (India) built Mentor Buddy, an AI chatbot for supporting students and teachers through their Entrepreneurial Mindsets Curriculum (EMC) in public high schools. The AI mentor offers 24x7 guidance as well as assessment and feedback on student milestone submissions – complementing rather than displacing the teacher’s role as an advocate. The AI mentor also supports teachers in classroom delivery of the curriculum, managing and tracking student teams’ progress, and supporting them through their project journey. The team has developed a model “playground” to test performance, cost, and latency trade-offs, and has onboarded to Evidential for A/B testing. The next phase focuses on enabling proactive intervention with students, assessing students’ sense of agency as they engage with the bot, and building a pipeline for weekly evaluations.
Noora Health (India, Bangladesh, Indonesia) built an AI copilot to help nurses answer questions from family caregivers about maternal health, TB, and noncommunicable diseases. Questions are received via WhatsApp and displayed in a call center dashboard. The tool summarizes past conversations, flags high-risk cases, filters non-medical messages, and drafts responses grounded in a vetted knowledge base. Clinicians review and send copilot responses to low-risk queries, giving them more time to focus on more complex cases. Early results show faster nurse response times and users returning with more questions, suggesting growing trust in the platform. However, the quality of responses is very sensitive to the quality of the underlying RAG knowledge base – something that has been observed in other healthcare contexts.
Lessons for any nonprofit adopting AI
The accelerator’s first year surfaced several lessons that extend beyond the cohort.
Identify a clear AI use case. These projects targeted busy frontline workers and isolated households with well-defined bottlenecks – like heavy inbound loads, limited mobile literacy, and onboarding churn – where AI can deliver measurable gains in efficiency or reach.
Evaluate from the start. Teams that paired AI solution deployment with LLM performance scoring and live metrics dashboards moved faster from experimentation to credible evidence.
Prioritize cost and latency early. Organizations that tested multiple pipelines – rather than assuming the most advanced model was optimal – were better positioned to scale sustainably.
Work across silos. Projects succeeded where domain experts (e.g. service delivery staff), product and engineering teams, and evaluation staff worked closely together.
The accelerator’s first year also revealed a set of shared challenges across the cohort.
Evaluation in low-resource contexts remains difficult. In low-resource languages like Swahili or Hausa, LLM performance evaluation often relies on small-sample human review or internal test sets. This limits reproducibility – highlighting the need for shared evaluation datasets, open benchmarks, and reusable intent classification modules.
Data collection is expensive and siloed. Collecting tens or hundreds of hours of labeled speech or domain-specific text is demanding, and incentives for data sharing remain weak. This leads to duplicated effort, slow field-wide progress, and a clear need for collaborative data pooling.
AI tools are evolving rapidly. Pricing models evolve, new releases alter performance trade-offs, and software tooling continues to advance rapidly. But many nonprofits lack capacity to track these changes, increasing the value of cohort-based learning and shared infrastructure.
Next steps
As OpenAI’s Michael Brown said in his opening remarks at the event in Delhi: “The methodology this Accelerator is championing – ship, test, refine, improve, repeat – is deeply iterative. From little things, big things grow.”
We will continue to support this cohort as they iterate, learn, and scale. We will also contribute to the ecosystem’s tooling (stay tuned)! Look out for our 2026 accelerator cohort, and please reach out to connect if you would like to contribute.



This is great. I am curious as to why you partnered with OpenAI when it seems Anthropic is the more ethical choice.