Since the first COVID-19 case was identified in Michigan, the State has worked to build out a data infrastructure to help Michigan residents understand the situation in the State. From the first few cases being a short list with vague demographic information to the current seven tab profile, the data has evolved quickly. This presents some unique challenges for some researchers as sometimes the evolution isn’t captured and data changes aren’t recorded because the pandemic has continued to be a fast-moving event without an already established data infrastructure.
Here we’ll highlight a few ways that the data has changed over time so that it is documented for reference. Comparing data from the beginning of the pandemic to the end can be tricky as there have been a substantial number of changes in data collection processes and reporting. While this can be frustrating from a research perspective, it demonstrates the adaptability of the State’s health department and their focus on improvement. The data collection and reporting methods being developed will ensure that our systems are better prepared for the next public health crisis.
Changes in Testing Reporting
Given that the world had to develop diagnostic testing from scratch to identify coronavirus cases, the data systems reporting also developing over time makes sense. Michigan’s testing data went through a significant evolution from basic positive/negative reporting to the current dashboard with ample information about diagnostic and serology testing by day. Here is an example of the test reporting early on in the pandemic:
It’s important to note that in the beginning of the pandemic, only testing conducted by the Michigan Department of Health and Human Services’ Bureau of Laboratories was counted. This means any testing done by hospitals or other commercial labs was not counted in early reports. For a little over a week, all testing data disappeared from the State’s reporting.
On March 27th, the State began reporting much broader testing data, including commercial (only LabCorp), hospital, and public health labs. But, this included notes explaining that the State reports specimen-level testing, not individual-level, which made it impossible to compare the positive tests to the number of cases. These notes helped provide guidelines for researchers to understand the data.
Since April, lab testing has been reported with graphs breaking down cumulative lab tests and percent positives statewide and by region. The percent positive rate is one of the governor’s Safe Start metrics. Until the end of May, however, the State calculated that rate by including positive antibody tests. Now the State separated out diagnostic and antibody testing on its data portal, which provides a more clear picture of the situation.
Currently, there is no information about which labs are reporting testing data on the State’s website. At last mention, the State reported all public and hospital labs and one commercial lab, LabCorp. This means some specimens tested by Quest Diagnostics or other commercial labs are potentially not reported in the State’s data. It’s also unclear if rapid tests, such as those conducted regularly for Detroit’s professional athletes and team staff, are included in the State’s data. As the pandemic continues, it might be possible for academics or the government to retroactively collect additional testing reports from other labs and provide more specific documentation to help paint the most accurate picture of the pandemic in Michigan.
Changes in Cases
Cases were originally reported with a similarly barebones approach in press releases like this one, where the individuals county, age, and suspected exposure method were listed. In less than a week, the State abandoned the individual-level case reporting and moved to a static table like this one:
Screenshot 3/17/2020
The table showed how many new cases were confirmed daily and then linked to a cumulative, or overall, table by county. Some of these early tables included gender. The cumulative data provided percentage breakdowns of sex, age, and hospitalizations.
A very clear example of how the State’s data reporting improved over time is the reported race/ethnicity data. When the State started reporting these demographic breakdowns in early April, 30% or more cases and deaths were assigned to “unknown” race. By the end of April, when we wrote a blog post on this data quality issue, the percentage of unknown race dropped to 27%. As of August 28th, only 14.8% of cases are unknown race and 3.3% of deaths are unknown race. Hispanic/Latino ethnicity is still a challenging demographic, but the rate of “unknown” cases has been cut in half (50% to 26%) and “unknown” deaths dropped from 40% to 14%. These data are not currently integrated into the portal, but can be found below the dashboard.
The case data has always been provisional. In the early days of the pandemic, attempts were made to note the county-level changes such as from March 19th: “Isabella County case removed. Test results were indeterminate.” As the caseload grew, these notes were dropped from the reporting process. This means that there are some case rate discrepancies between some of the COVID-19 data maps, because it’s generally impossible to know which day a case was removed from a specific county.
When the pandemic is over, there will be a final data set that we can use to understand the trends. In the meantime, it is important to consider that the State is in the process of creating that dataset and so updates, corrections, and more will be part of the day-to-day reporting. By taking into account the provisional nature and considering the data in broader contexts, we can react more clearly to the changes.
Other Data at the State-Level
Since adding the race/ethnicity breakdown, the State has continuously added additional data that is being collected and updating the method it is reported. The State’s dashboard is an interactive way for Michigan residents to access all sorts of data about cases and testing broken down by region or time. It is a huge improvement from the old static spreadsheets, and for data analysts like us, they still include public use datasets for download.
Additional topics that the State has added overtime include: “Data About Places,” which includes reports from hospitals on bed capacity, ventilator usage, morgue availability, and various personal protection equipment supplies like gloves, N95 masks, and gowns; long term facility case and death counts; coronavirus symptoms in ER visits; recovery data; and current outbreak settings (i.e. did outbreaks happen in an office, a healthcare facility, daycare, school, etc.).
What’s next?
There have been substantial changes over time in different data structures on the State’s data portal. Some data changes, like adding additional data sources for diagnostic testing, make direct comparisons between data from early in the pandemic and today less useful without thoughtful contextualization. Being aware of these types of updates can ensure we are consuming and publishing analysis that can have the most impact on our current situation.
Other data changes just make more data accessible, which is great. However, the updates to the process also demonstrate a significant improvement in the State’s data processes. Infrastructure development is time consuming and costly so it’s reasonable that the evolution has taken time. There is a significant amount of data to explore on the State’s dashboard so, in our next blog post, we will highlight the portal and walk our readers through the data that’s available.