What separates good and bad data
Data is increasingly important for companies to make informed decisions — but how do you know if that data is flawed?
As more and more businesses approach AI initiatives, unlocking the potential value held within their vast data stores is an increasing priority. And while we’ve all heard that data can provide company insight, guide future decisions, and reveal information about competitors, what many people neglect to mention is these benefits only apply to good data.
Arguably, bad data is more damaging to a company than having no data at all. A recent survey of Chief Digital Officers (CDOs) by AWS found that nearly half (46%) say poor data quality is their top challenge to realizing the full potential of generative AI. Furthermore, bad data can be costly—as much as $12.9 million on average, according to Gartner.
Good data is:
- Accurate: Data reflects real-world conditions and does not contain errors.
- Complete: All necessary data points are included to get a holistic view.
- Consistent: Data is uniform across sources and time periods.
- Timely: Data is recent enough to remain relevant to current conditions.
- Relevant: Data aligns with the specific goals or questions being addressed.
If leveraging data for informed decision-making, enhanced efficiency, and AI models is on your organization’s agenda, it’s time to focus on data quality. Understand what can render data unusable. Here are some key signs your data is problematic:
Inconsistent results across reports
In many cases, data problems start with the way data is organized—or rather, disorganized. If data is siloed across departments and platforms, the same data can be reported differently, resulting in errors, redundancies, and inconsistencies. For example, consider all of the places where customer information is stored—updating a customer’s address in one platform but not another can lead to confusion and wasted resources. Data inconsistency can even lead to legal and compliance issues.
If your organization constantly tries to make sense of conflicting data for the same entries, it can be a clear sign that you need to improve your data quality.
Missing or incomplete data
Are there large gaps in your data? Data that lacks temporal consistency will not be historically accurate, which can negatively impact decision-making. For example, if you want a look at a company's financial history for trends, but you only have 2 years out of 10 years, that data can present a very inaccurate look at economic performance.
However, even if you have missing data, you can still build a case for further investigation using other research methods, such as qualitative surveys, to try and paint a more accurate picture. But until you conduct a data gap analysis, you won’t fully understand where your blind spots are.
Obsolete data
Information may not come with an expiration date, but it can certainly go stale, leading to misguided decisions. If data is not constantly updated, it can quickly become outdated. For example, a patient may have developed a new allergy since their last doctor’s visit, and without an updated record, their safety could be at risk.
Scheduling periodic reviews of your data is good governance, and protects your organization from unnecessary storage costs, security risks, and compliance problems. One important question to ask throughout your data validation process: Is the data we are collecting aligned with needs and goals of our organization?
Biased data sources
All data collection methods are subject to bias, so it’s impractical to attempt to eliminate it completely. However, understanding how biases emerge can help reduce them.
For example, sampling biases can be harmful when data is collected in ways that limit the representation of certain demographic groups. Subsequently, machine learning results are skewed and don’t represent a full population. Confirmation biases can occur when people interpret data to support their point of view.
Examples of damaging algorithmic bias have been widely publicized, such as Amazon’s resume-screening model that favored male applicants, or racist stereotypes making their way into AI-generated healthcare advice. Combating bias in data sets requires intentional oversight such as third-party data quality and fairness checks, continuous monitoring, and safety mechanisms.
Human error
Finally, we need to address possibly the largest contributor to bad data issues—human error. You can do everything else right, from your sources to your planning, but one mis-entered number in your spreadsheets could spoil everything.
If your data conflicts with known facts or expected benchmarks, you may have inconsistencies throughout your data. To give you an idea of how widespread inaccurate data can be, as much as 80% of the work that professional data scientists do is cleaning data before diving into any insights. Taking the time to be thorough with data verification and validation can make all the difference on the road to data quality.
Good data in, bad data out
This sounds like a lot of work, doesn’t it? In many cases, you need an expert guide to correct the systemic issues that lead to bad data. Reach out to the Modus team for more analysis, insight, and support today.