Transparency About Challenges Strengthens Our Data Community

While we often share our highlight reel with you, sometimes we also share about the challenges of accomplishing a specific task or lessons we learned in working on a project.  For example, last year we tackled a big project that brought together data from different nonprofit organizations, and when we released the final report, we also published six takeaways highlighting some of the things we learned that could make the next attempt even better. This sharing of our “fail forward” moments is really important to our mission of building community knowledge because it helps others know that they aren’t alone in the challenges they face with data and helps us identify potential creative solutions even if we aren’t the ones to implement them.

We are very open in the office about these challenges and failures, but I was quite surprised to notice that our introductory D3 presentation isn’t really transparent about any of those topics.  Last month, I introduced D3 to a Global Ties cohort from El Salvador.  If you’re not familiar with Global Ties, picture study abroad for professionals.  Global Ties coordinates meetings between international visitors and a variety of local public, private, and non-profit organizations to learn and exchange ideas across sectors like education, public health, and civic engagement. 

When delegations come to Detroit interested in data, D3 gets a call to come and give a presentation. The ideas of a data intermediary and the work we do are unique in a lot of ways to the United States.  So after a presentation showing off some of the projects we’ve worked on like Forgotten Harvest’s food insecurity index, State of the Detroit Child, and the Opportunity Youth research project, I opened the floor to questions.

International delegations are usually very interested in how we obtain and verify data.   The El Salvador delegation was no different. Many of the difficulties expressed by the delegates are situations D3 is also familiar with like unreliable data systems, hesitancy of sharing data openly, and the skills to understand the analyses.  We talked about two key concepts: building trust and relationships and how to treat data that relies on other people’s record keeping.

First, when we think about data sharing, especially in the context of Detroit, it’s important to remember that our current situation, with the City’s open data portal, the state opening many data sources to the public, etc., are only very recent developments.   For the last twelve years, D3 has existed to provide access to reliable data, but that has required building a strong reputation of ethical data use as well as trust with data providers that we will use the data responsibly in ways that don’t put confidentiality at risk. For example, we ensure that data we publish meets certain suppression criteria to prevent individual’s identities from being discovered. 

Our introductory presentation definitely talks about how D3 accesses and processes data, but we never actually mention our challenges of data access. There are a few places during the presentation where I could have talked about how hard it is to get reliable data or how we built relationships over the last decade to be able to access the data for that project. Every single project we talk about had some sort of data access challenge and adding a few examples of how we overcame those would help reassure those in attendance that data acquisition is really hard!

The second key concept we discussed is how messy data can be.  How do we, as D3, navigate data that we can’t independently verify, especially knowing that data entry is often a messy process?  The truth is, we can only work with what we have. In some cases, that requires us to not use a dataset if it’s really messy or doesn’t have appropriate documentation.  In other cases, it means that we include a lot of descriptive footnotes in our reports, explaining the limitation of the data.  

A common phrase in D3’s office is “reported and recorded” which means that we only have data for events that were (1) reported by an individual and (2) recorded as an event in the dataset.  A good example for this is crime data. There are many different layers that factor into whether someone dials 9-1-1 to report a crime taking place and even if a crime is reported, we have to rely on the responding police officers to report it exactly right, but data entry systems have their own complications. Thinking through how we can ensure that we acknowledge that these amazing projects are still based on data that we can’t always independently verify.

It’s easy to talk about our “highlight reel”, but part of building a shared knowledge community is being transparent about the facets of our work that aren’t easy and demonstrating the steps we take to mitigate those challenges. We already do that in many parts of our work, but this meeting highlighted to me a need to add in more context to our presentation about difficulties we have with data access and reliability.