News Public Health Data & Training Center

Data Infrastructure Part I: The unseen fabric that connects information


“Water, water everywhere; but not a drop to drink.” The sailor in Samuel Taylor Coleridge’s “The Rime of the Ancient Mariner” says this as he gazes out upon the salty ocean from the bow of his ship. This might feel like an apt metaphor to many of our region’s decision-makers as they tried to track–in real time–testing and hospitalizations during the coronavirus pandemic. Every day over the past year, hundreds of people were tested, yet at the same time, hundreds who should have been were not; some also became sick and were admitted to the hospital. Each individual hospital and clinic could see what was going on within its walls, but no one had the full birds-eye view.

The birds-eye view is so important not only for tracking individuals, but also for monitoring entire populations. Further, this all-encompassing view of information is required for equitable distribution of resources. For example, a birds-eye view enables us to see neighborhoods where the percentage of people being tested is lower than it should be, so that targeted interventions to increase capacity for testing can be planned, deployed and evaluated.

Many people continue to work tirelessly to wrangle such data across the region. One such person is Health & Community Data Scientist, Ben Cooper from the St. Louis Regional Data Alliance. Ben describes himself as “an intensely curious person, passionate about using data to answer questions and improve his community.”

Ben Cooper

We caught up with Ben after an extraordinary year of collaboration between the Institute for Public Health Data & Training Center, local and regional health entities and the St. Louis Regional Data Alliance.

How would you define “data infrastructure”?

I typically describe it as the critical, yet often unseen, fabric connecting multiple entities, which can transform data into something meaningful. I will use the example of local healthcare as many people assume (incorrectly) that public health data already flows smoothly between health departments, providers, and hospitals when in fact, it does not. The pandemic has highlighted this clearly, with devastating results.

What specific area or issue related to regional data infrastructure planning do you work on?

I typically focus on the public health and healthcare sectors as well as nonprofit.

Do you have an example of a past or current project that shows the importance of this work?  

I recently worked on a project with a local health department to manage Covid-19 data coming in daily from several large healthcare systems. The final workflow organized and harmonized the data, allowing for identification of individual patients across multiple data streams within seconds. This work was valuable in supporting contact tracing and testing efforts.

How can data infrastructure improve public health? 

Quality data infrastructure is essential for data to be actionable. In order for data to be actionable, it must be accurate, complete and timely. Data that are two years old or missing key pieces quickly lose their utility. Asking the community to wait two years to find out if an intervention worked or not, or if a new disease is spreading rapidly, is simply unacceptable.

What is one of the biggest challenges facing the development of shared data infrastructure for the region?

Lack of trust. All the servers, hard drives and infrastructure in the world can’t make up for a lack of trust between key partners who either possess or need data. Successful data sharing must come from a collective trust among data owners, researchers and the community itself.