Data Infrastructure Part II: How to build a bridge for information transport | Institute for Public Health

Paul Sorenson, MSW Director, St. Louis Regional Data Alliance (RDA) Interim Co-Director, UMSL Community Innovation and Action Center

The Institute for Public Health is highlighting the people and community partners working in collaboration with our Public Health Data & Training Center to create and implement a shared data infrastructure in the St. Louis Region. This infrastructure connects information between hospitals and other health entities about major public health issues such as COVID-19 and sexually transmitted infections, among other issues. As explained by our most recent interviewee below, this shared information is essential for equitable distribution of resources across St. Louis.

This is the second in a series of interviews with key data-infrastructure creators and practitioners about the genesis and sustainability of this unprecedented infrastructure development.

Tell us more about the Regional Data Alliance’s mission.

The RDA’s mission is to build shared data infrastructure and support strong data actors who use quality data to improve people’s lives. Our membership has grown to more than 315 people across local governments, universities, nonprofits, and technology companies. Our focus on data infrastructure has also grown substantially over the past year as our community has grappled with COVID-19 across institutions that did not previously have shared systems or practices — that’s what we’re in the process of exploring now.

How do you define “data infrastructure”?

We can compare data infrastructure to more recognizable components of infrastructure like roads or bridges. Say that you need to cross a river but there isn’t a bridge — what you would probably do is hire a boat to take you across. The boat can solve your problem in the short-term — you crossed the river! — but you have to hire another boat to get back across, and yet another boat to cross again. That’s where we are right now with data infrastructure: We need data to travel from data system to data system (shore to shore), often across different organizations, but haven’t built the necessary bridges (infrastructure) to facilitate exchange.

It’s easy to get stuck in the “rent a boat” paradigm; most data projects are shorter-term, originating from universities or community-based organizations that likely only have the scope and the budget to rent a boat (think one-time data exports, a new survey tool, etc.). Then something like Covid comes along and puts a huge amount of pressure on sharing critical data back and forth quickly across organizations — and now it’s clear that “boats” are a woefully insufficient way to cross data divides. So how do we go about building a bridge?

What specific area or issue related to regional data infrastructure planning do you work on?

To extend this bridge metaphor just a bit further — the RDA is trying to pull the necessary people and organizations together that can collectively scope and budget for building the bridge. What should it be designed to carry? How accessible or public should it be? What are the rules that govern the safety and privacy of individuals and entities that cross the bridge? We have a window of opportunity to define the value of building this data infrastructure— as well as pay for the construction of the “bridge,” given that it’s more expensive than renting a boat up front but dramatically reduces costs and increases impact over time.
To bring this back to the real world, we’re working with partners from Washington University, St. Louis University, health systems, local governments, funders, and community organizations to understand what public health data infrastructure should look like for the St. Louis Region that helps us collectively respond to ongoing challenges like COVID-19, STIs, gun violence, and health disparities. There are still many important questions around governance, legal parameters, interaction with state systems, etc. that need to be answered, but the time to explore is now.

What’s an example of a past or current project that illustrates the importance of this work?

Over the winter, the RDA worked with our partners at Daugherty Business Solutions to help ingest, transform, and integrate COVID-19 data from major health systems for St. Louis County and use it for contact tracing. Before this work, it was hard to manage a constant flow of these data for public health interventions — and while this solution was limited to the county, it helped us understand what components are needed for broader infrastructure projects. These include data mapping across major health providers and standards like HL7, record matching algorithms, and protocols for pulling data from or pushing data to different systems. This knowledge helps us develop a blueprint for regional infrastructure that can be expanded, modified, and redeployed depending on the relevant issue or sector. We’re also engaged in similar conversations in education and housing; these data sources and partners can be quite different, but the underlying process can and should be replicated whenever possible.

How can data infrastructure improve public health?

The goal of having shared data infrastructure is to dramatically reduce the time between when a public health challenge is identified and when it can be addressed, as well as expand the depth and effectiveness of public health interventions. Data plays a critical role in identifying priorities, targeting disparities, individualizing interventions, etc. — and while data is never the only factor at play (funding parameters, community factors, organizational capacity also need to be addressed), it is often one of the few that can be better systemized up front to respond to emerging challenges as other context comes into focus. Data should be an enabler, not a barrier — but without infrastructure, it often becomes a time consuming and expensive hurdle.

It’s worth imagining what our region’s COVID-19 response could have looked like if we already had shared data infrastructure, whether at the state or local level. Local and state public health departments could have easily coordinated with each other to clearly and quickly understand how the virus was spreading and where to target interventions. Vaccinations — from registration onward — could have been much better coordinated to avoid the current fragmentation of data, which makes it much harder for any entity to know who has been vaccinated, who’s still waiting, and where to target outreach efforts.
What seems to be clicking for people is that we need shared data infrastructure to manage the long tail of COVID-19 and prepare for future pandemics — and that the current state of fragmentation also affects every other critical public health issue in a way that is equally unacceptable.

What is one of the biggest challenges facing the development of shared data infrastructure for the region?

The biggest challenge is trust and governance across a diversity of organizations, leaders, and communities that do not always share priorities or interests. I don’t think that these priorities are often actively at odds with each other, but aligning values across stakeholders is a delicate process that will take some time to perfect. That may be where the bridge metaphor falls short, or where it crosses into political considerations of spending bills and environmental impact studies. The human and organizational lift here is substantial, and where we need to spend most of our time and energy.
The good news is that we have a technical roadmap — and that technology to support data infrastructure has never been more accessible, especially given recent progress in the private sector. We have a blueprint. So, we need to focus on the why and how, the human pieces, to orient the what, the technical components, for data infrastructure to be used as an essential tool for community health improvement.

Read Part I in the Data Infrastructure blog series.

Read Part III in the Data Infrastructure blog series.