When we think about data collection, many people want to jump right into the process of asking questions. But it’s important to step back and think about the whole picture. As part of developing Data University courses, we’ve developed workshops, handouts, and blog posts about different aspects of the research process that we’ve assembled on our Data Literacy page. Here we’re going to review the overall strategy in one place.
The most important takeaway is this: reflect on your research question. Your research question is the foundation of the entire process, and it is worth the time and effort to make sure each step you take is getting you closer to an answer that is reliable and ethical. This means to talk to your stakeholders, review your questions with community members, and reflect your research plan to coworkers and other experts in the field. Doing this helps ensure that the data you collect actually answers the questions you have with meaningful information.
This is why the first step in our research methods is “Think.” There are a lot of different ways to collect data or ask questions and the best place to start is finding out if someone else has already had your idea. We created a tool called Report Detroit, which helps the audience visualize where Detroit reports have been focused. There are overlapping areas and subjects shown in the tool which revealed the redundancy that can happen when people don’t simply Google if a similar report/study has already been completed.
Next ask yourself: is this a problem that really needs to be solved? The best practice would be to do some initial research (besides has it already been done) to decide who this project impacts/targets. Once you’re sure the problem really needs to be solved, ask, Is this something that is even important to the community I’m trying to serve? To answer this, you can create an advisory board or do some outreach in the community to ensure that the project is worth pursuing. For our Turning the Corner Project that identified areas in Detroit that could be vulnerable to displacement of current residents, we built an advisory board, as well as completed qualitative interviews in order to provide local context that informed us that the problem did need to be solved, who was impacted by the the problem, and that it was important that we contribute data to the solution.
Lastly, before you collect more of your own data, start to think about what gaps exist in your current data. Are there outside sources where you could obtain this data? If so, explore them before diving into thinking about your own data collection. If the data isn’t useful for answering your specific question, you can still learn about standardized ways of asking certain questions and gain insights to your topic. As you narrow down the questions you need to ask, start to think about who you’re going to survey and what details you need to collect about their demographics or households. If you’re studying perceptions of grade schools, for example, knowing how many students live in their household would be much more relevant than if you are surveying about amenities for senior citizens.
Now that you’ve put some time into thinking about the context of your data and research question, it’s time to consider how to design the data collection process and data set. One key reflection point here is to consider at the onset what limitations your research might have as you proceed through the data analysis process.
When we think about data, there are five key characteristics we should think about: scope, geography, availability, scale, and source/methods. We’ve written about them in depth before when looking at vacancy rates. This is the stage of your process when the methodology of your data collection and analysis should start to take shape as you consider how many people you’re going to survey, what data sources you can acquire from other sources, the types of data analysis that can answer your question, and more.
As you consider the design process, also consider ways to protect your data. In some cases, it could be appropriate or even legally mandated to suppress (not report) small samples of data. For example, we have a data agreement with the state of Michigan to receive lead blood level data. In order for them to share the data with us, we’ve signed a DUA (data user agreement) which requires us to aggregate (or summarize) the data up from an individual level to a group level like census tracts. If those aggregation levels still provide a level of detail where someone would identify the research subject we then suppress the data. This is important to ensure that areas with a low number of individuals with elevated lead blood levels, those individuals cannot get identified. Also, consider how the data you’re collecting could be inaccurate or misinterpreted. This means reflecting on whether your survey questions could be biased or leading, and on the nuances of other peoples’ data collection processes.
Depending on the methodology you’ve settled on, you might need to start data collection by implementing a survey or downloading necessary data sources. A big part of data collection is continuing to reflect on the properties of your collected data, its limitations, and nuances. For example, we know that in certain communities crime is underreported, so relying on administrative data from the police department about where crimes are most prevalent might be misleading. By doing interviews with residents or business owners in the community, we can provide important context for understanding the reported and recorded crime data.
When collecting data it’s very important to have a plan for keeping personally identifiable information secure. Consider if you truly need to store that information or if you can recode the data to remove any traces of identifiable information. Also consider who owns the data, especially when thinking about community data, because they may be the gatekeepers for who has access to it and how it is used in the present day and in the future. Make a plan for how you can ensure that the community itself has appropriate access to the information being collected about itself.
During the analysis process, it’s again important to consider the limitations and nuances of your particular data. This will be important to note for your audiences in the reporting phase. Ensure that you’re starting your analysis with an open mind, and be thoughtful of any assumptions or prejudices you’re bringing to the table.
Start the analysis with an open mind!
We also highly recommend quality-checking your data more than once and even have someone else do an additional quality check if possible. For example, we’ll make sure that when we combine two data sets that the number of records is still the same and we didn’t accidentally lose any data in the process.
Before diving into complicated analysis, it’s important to understand the dataset and how key pieces of data appear. Using basic statistics like distributions and means are important because they help ground our analysis in basics by making sure we understand what the context is for each piece of the analysis.A helpful place to start is creating basic charts or other visualizations or doing basic comparisons between groups of interest.
Another consideration in the analysis phase is how to include members of the community in the synthesis of the data about themselves. This helps provide the context and nuances that quantitative data might not capture and ensure that the interventions being designed to have a positive impact on the community.
In order for the data to be meaningful, it has to make its way in the world. This is when you decide how to tell the data’s stories. The most basic steps for reporting are to:
- Choose a meaningful data point
- Choose an appropriate story type
- Create a visualization and/or write the story
- Reflect on the research question
In order to choose the most appropriate data point, identify your audience and how familiar they are with the data. Consider what context and nuance they might need to know about the data in order to understand the story.
When considering data visualizations, some key characteristics are:
- Feels practical
Unhelpful visualizations tend to attempt to fit too much information into one space and usually appears fancy versus practical. Oftentimes people create proportions that don’t make sense or select a chart style that isn’t appropriate for the kind of data. For example, a pie chart should only ever be used when showing parts of a whole that add up to 100% of the whole. Most visualizations need multiple data points to visualize something unique, while a single data point usually lends itself better to a narrative.
The last part of responsible data analysis is to find a way to connect your story to meaningful action. Consider the goals you have for the data to address the problem your research question identified. What existing or new opportunities can you link your audience to after they have engaged with your data story? Ensure that you have future data collection steps that can measure progress toward these goals.
Remember, a strong research question is the key to successful data collection, and is the first step before you think, design, collect, analyze, report, and act. Below is an image you can save to return to as you progress through your data collection process.
Dive deeper into all of content about Data Literacy