Building Data Systems for Social Impact
Q&A with Noora Health’s Anubhav Arora & The Agency Fund’s Jake Hughey
Behind most data-driven organizations lies a powerful infrastructure to store and analyze data. This infrastructure, which includes data warehouses and extract-load-transform (ELT) pipelines, is essential for learning and operating at scale. In the social sector, however, data infrastructure remains underdeveloped.
That’s starting to change. Many social sector organizations are beginning to realize that strong data systems are within reach – and partners like The Agency Fund are helping connect the dots.
The conversation below highlights a standout example. Noora Health, a nonprofit dedicated to empowering family caregivers, recently partnered with The Agency Fund to strengthen its data infrastructure. Anubhav Arora, Noora's Co-Executive Director – Platforms and Programs, and Jake Hughey, a data scientist and data engineer with The Agency Fund, explore the catalytic role these systems and tools can play for social-sector efforts at scale.
Before we dive into Noora’s data infrastructure, let’s clarify some technical terms. Jake, what are “data warehouses” and “ELT pipelines” – and why are they important?
Jake Hughey: A data warehouse is just a collection of tables about everything you think is important about your program and how users or beneficiaries interact with it. Ideally, it includes all the program data being collected routinely. For an organization like Noora, working in the health space, there might be a table with information about nurses. Each row is a nurse, and the columns have details like each nurse’s hospital and district. Another table might track training sessions: this nurse did a postnatal care session on this date, another did an antenatal care session on that date.
Ultimately, the goal is to have a dashboard or other report to tell you what's happening in your program right now. How well is a recent change working? Where are problems occurring? Where are users getting stuck? People in the organization should be able to look at a dashboard and get a fast, comprehensible view of how well the program is working – all powered by the data warehouse.
Most social sector organizations already maintain detailed program records as part of monitoring and evaluation. How are live dashboards different?
Hughey: One key difference is in the word “live.” Traditionally, data collected as part of M&E might be used for one-off analyses (run on one person’s laptop), then might show up in a slide deck. With data pipelines, which often culminate in dashboards, we’re really just talking about a way to make the process as automated, reproducible, and scalable as possible. The driving questions are largely the same: What’s happening? Is our intervention working? How do we know? It’s about getting timely, accurate, and meaningful information to the right people.
Do organizations need to reach a certain size or scale before investing in data systems – and why do these systems often feel so daunting to build?
Hughey: When the intervention is at a small scale or is still highly in flux, you probably don’t need a sophisticated data warehouse. You're just trying to figure out what’s going on with the intervention; you can get fast qualitative feedback just by calling someone or doing a visit. But as the program stabilizes and grows, that system will likely start to break down and you may miss opportunities to learn from quantitative indicators and a data system. You won’t know where the program is running well, where there are problems, or where you should target your efforts. Conversely, you won’t know where things are working well – so you can’t learn from and replicate those successes. At that point, something like a data warehouse can give you a more comprehensive view.
Data systems can seem daunting to build because to do it well, automation is important. Things should happen automatically, without depending on a human clicking a mouse: I should wake up, and my dashboard should automatically show the latest and most accurate data. In terms of technical requirements, that’s not trivial. If you haven't worked with large- or medium-scale data, you might not have encountered the typical tools for building a dashboard or data warehouse. But there's also nothing magical about it.
That’s a good segue. Anubhav, how did technology become so central to Noora Health’s operations – what drove that shift?
Anubhav Arora: Tech wasn’t a big part of Noora’s work from day one. A core part of our program is implementing high-touch, in-person training sessions for family caregivers – and for the first six or seven years, technology did not play a central role. But early on, we noticed that participants wanted to share what they learned in training sessions with their extended family. This is encouraging from a behavior change perspective, and we needed a way to engage with families beyond that single touchpoint. Technology became a natural tool for that, and the shift accelerated during COVID.
Today, a big part of our work is remotely supporting caregivers to adopt healthier caregiving practices via our digital messaging platform. We send timely and personalized behavioral nudges, and we answer individual users’ questions through our team of clinical nurses.
Tech also helps us reach caregivers who aren’t accessing the public healthcare system. In three states in India, for example, we receive data from government portals about all the registered pregnant women in each state. We can reach out directly to these mothers, sending them recommendations on healthy practices.
How did the partnership between Noora and The Agency Fund unfold?
Hughey: Noora was already operating at considerable scale when we started working together. They were starting to build a data warehouse and data pipelines, and they already had some dashboards, although isolated ones for different regions and products.
We first offered guidance on how to design the system. What sort of tools could be useful, given how the program works? We also co-designed and helped establish a process to prioritize effort, in terms of aspects of the program and data sources. For example, if you have a million spreadsheets, which ones need to go into the data warehouse, and which ones can wait?
What key data challenges did Noora Health face as it scaled?
Arora: One need was handling all the different kinds of data we collect – from multiple state governments, in-person sessions, digital interventions like WhatsApp, and observational surveys. We get a lot of data, but we couldn’t correlate it or track a user’s journey between the physical and digital worlds. Our big motivation for building a more sophisticated data system was to have a single source of truth – to see all our data in one place, so we could use data to guide program design, make program changes, and focus on our most effective program components.
Hughey: These are common challenges – several of the organizations we partner with have in-person interventions alongside digital components. Making sure those data sources can be smoothly integrated with each other can be tricky, but invaluable. For example, it can highlight where users are dropping off, either in person or digitally. The goal is to give program managers a holistic view of what's happening.
Arora: On the in-person side, we need to know which hospitals are conducting sessions consistently and maintaining high quality. On the digital side, we need to understand how users are engaging with the content. We need to understand the overall user funnel - how many users attended sessions? Of those, how many signed up on the app, how many engaged, how many asked questions? What kinds of questions are they asking?
With our upgraded systems, we can now view all these metrics consistently and in real time through dashboards. This allows program managers to focus on underperforming hospitals and identify where the program is breaking down. These systems are now also enabling us to run multiple A/B experiments consistently and reliably, helping us better understand the impact of different levers on our target metrics.
What are the essential tools and resources organizations need to build effective data systems?
Hughey: Commitment is crucial. As with other infrastructure, the ROI may not show up immediately, and there will always be seemingly more urgent issues begging for attention.
Arora: Start by anchoring yourself around a list of key user metrics. Who is the user? What is the problem? What metric are you optimizing for? Start with that list of prioritized user metrics, then go backwards from there – what sort of data sources and pipelines exist within the system?
Hughey: Once an organization has begun to scale, it can be valuable to have someone with a data engineering skill set. That type of person can help identify the key system features. Given the desired metrics, where does the data need to come from? What are the cleaning steps, transformations, or other data processing needed to end up in a dashboard? What checks should be implemented to catch potentially invalid data? Data engineers can also help automate the pipelines, so the dashboards stay up-to-date with minimal intervention.
Ideally, these ideas should be integrated with the broader program. Who’s actually going to look at the dashboard? How will they know if the program’s working? Are we collecting the right type of data? In the process of building these data flows and making the warehouse, you discover a lot about your data and indicators – in terms of what's important, what can be calculated, and what’s accurate. That feedback and iteration between the engineers, the data scientists, and the program team is invaluable for program monitoring and data quality.
How has collaboration with The Agency Fund supported Noora’s work beyond the nuts and bolts of building data tools?
Arora: We've been working with The Agency Fund for over a year, and there’s a strong foundation of trust. The ecosystem that they have fostered is close-knit, there’s a lot of transparency, and the focus is always on promoting users’ interests and having shared impact.
We are currently part of an AI accelerator hosted by The Agency Fund that has supported us in two key ways. First, it provided much-needed technical mentorship to help our small engineering team build our AI pipeline and system. This has enabled us to stay aligned with best practices and learn from experts on how to evaluate our AI applications. Second, it offered valuable opportunities to learn from our peers, whose use cases – whether in data infrastructure or AI systems – are often quite similar to ours, even if they are being deployed in different domains.
How is Noora integrating AI into its programming, and how do those efforts connect to the broader issue of data management at scale?
Arora: As our programs have scaled, we’ve realized that AI can help our clinical teams respond to the increased volume of caregiver questions, while also personalizing content for each user. Before AI, our behavioral campaigns were very episodic and fixed, with personalization limited to language or condition area. What we have realized is that caregiving is a very complex phenomenon: at any given time, a caregiver might be supporting several family members. Our hypothesis with AI is that more personalized and relevant content strengthens engagement, which leads to better practice adoption and healthier outcomes.
We are working to build a scalable AI helpdesk that can maintain quality and consistency. Having a human in the loop is a core design principle; currently, every output is reviewed by in-house clinical and non-clinical teams. But as we gain more confidence with AI outputs, we’re developing a system to triage outputs into low- and high-risk queries, with only high-risk ones going through human review. A broader challenge is data privacy. Our users share a lot of personal information and data with us, so we’re sensitive to how we and our partners approach data security and privacy.
What would it look like if strong data flows and practices became the norm across the social sector?
Hughey: Based on our experience so far, I think we would see more people at more organizations who have reliable quantitative evidence to help them make decisions and ultimately improve their program's impact. There are so many pressing challenges in this sector, and we believe data infrastructure can accelerate and strengthen how organizations learn as they scale.
Interviews by Niamh Ní Mhaoileoin. Writing and editing by Greg Larson. Film by Michael Clarke.