How to Get Proactive About Data Quality

Matt Harrison Clough When it comes to dealing with data quality, teams and companies fall into one of three modes: unmanaged, organized cleanup, or proactive prevention. Most organizations get stuck in one of the first two. The work of addressing data issues is demanding, messy, and time-consuming. Poor-quality data can cripple decision-making and doom generative AI projects, […]

May 5, 2025 - 12:00
 0
How to Get Proactive About Data Quality

Matt Harrison Clough

When it comes to dealing with data quality, teams and companies fall into one of three modes: unmanaged, organized cleanup, or proactive prevention. Most organizations get stuck in one of the first two. The work of addressing data issues is demanding, messy, and time-consuming. Poor-quality data can cripple decision-making and doom generative AI projects, since bad data fed to AI models turns into untrustworthy results.   

The real data quality breakthrough happens when companies transition to the third mode, where errors are prevented at the source. But this shift requires a major change in mindset, in which every employee recognizes that they are both a data creator and a data customer and starts acting like it. 

How can companies reach this third mode of data quality? In our experience, the change often starts with a provocateur, such as a manager with a nagging business problem, and gains momentum when leaders at many levels start working together to improve data quality within their own spans of influence. Let’s explore lessons on how to get started and how this journey to proactive data quality improvement has worked at organizations like meal-kit company HelloFresh.

A Common Trap

It is easy to see how companies get caught in the unmanaged and organized cleanup modes. Look at the data flow in any company and you’ll observe a daisy chain: People in a department use data to do their jobs, in turn creating new data, which goes to the next group in line. People generally work within their silos, seeing themselves only as salespeople, vendor managers, market researchers, and so forth. When someone — say, a salesperson — sees an error, it’s only natural that they want to correct it. But correcting errors is difficult, time-consuming work, and plenty of quality issues go undetected, propagating further downstream damage. This is Mode 1, unmanaged data.

Sooner or later, someone recognizes the business impact of a constant stream of data errors. The company then adopts a more formalized and centralized approach in which a data cleanup team implements a tool to address errors better, faster, and more cheaply. The gains are generally small. Finding errors is easy, but fixing them without understanding the business context is not. This is Mode 2, organized cleanup. While this beats pure chaos, it’s still an endless cycle of fixing errors.  

Getting to Mode 3 requires a creative leap: Rather than making mistakes in one part of the organization and cleaning them up in another, people must create data correctly in the first place. This means that in organizations that have traditionally had separate groups of data creators and data customers, people must come to see themselves as both creators and customers. They must share responsibility and work together, taking the following steps:

  • Data customers clarify and communicate their data requirements to data creators.
  • Together, creators and customers measure the data against those requirements.
  • Creators conduct improvement projects to close the gaps and implement controls to stop errors in their tracks.  
  • Customers and creators work together to evolve as business processes and goals change.

For this work to happen at scale, data customers and creators need encouragement, training, and support. Thus, the organization needs a core team responsible for defining and coordinating the overall program, training data creators and data customers, helping them connect, maintaining a dashboard, keeping senior management informed, and advancing the effort as business needs dictate. Organizations that make it to Mode 3 also make wise use of connectors — people who sit between the core team and customers/creators to assist them in the day-in and day-out work of Mode 3. Such connectors may go by titles like embedded data manager, data product manager, data ambassador, responsible party, and quality champion. 

Eliminating data problems at the root beats constantly chasing errors, as HelloFresh learned.

HelloFresh Seeks Out Better Data

The journey to better-quality data at HelloFresh (HF), a Berlin-based meal-kit company founded in 2011, is instructive for both business and data leaders. (Note: Article coauthor Kinda served as HF’s data quality analyst and data governance architect and, later, as its data governance program lead, from 2019 to 2023. This discussion is based on her experience, conversations with former colleagues, and publicly available information). Like all startups, HF initially had a lot to do, and, as with most startups, the data quality program remained in unmanaged mode far longer than it should have.

As HF grew, teams created and used more and more data to run the store — tracking inventory, processing customer orders, assembling meal kits, managing logistics for timely deliveries, coordinating with vendors for ingredient sourcing, and providing responsive customer service.

Marketing, operations and fulfillment, and other teams also depended on this data, though in very different ways. Data quality issues arose all the time and were dealt with by the people who were affected. When quality issues became increasingly visible, HF transitioned to the second mode: organized cleanup. Its central engineering team, residing under the technology organization, was charged with preparing better-quality data and making it available to downstream data customers. This team did the best it could — but unfortunately, it did not have a deep enough understanding of how people were using the data. 

Data customers grew increasingly frustrated as they continued to spend large fractions of their workdays dealing with data that did not meet their needs. By 2019, multiple data-quality efforts had emerged to help the central team make improvements. Still, those efforts did not satisfy the international marketing analytics team, which needed quality data in order to understand customer sentiment, refine customer profiles, and develop personalized advertising campaigns. So it started its own manual data validation and monitoring efforts.

Further, the International Operations Business Intelligence (OPSBI) team, responsible for producing a weekly global operating report for the C-suite, faced even greater challenges in consolidating data from the company’s many geographies. Multiple data sources meant more late deliveries, mistakes, and inconsistencies as errors cascaded downstream.

Frustrated, the OPSBI team transitioned to the third, proactive mode. It developed a data quality framework that featured data quality service level agreements, which clarified data requirements in the form of “data contracts.” This solidified OPSBI’s role as data customer, whereas previously, the team had only been a creator. Similarly, local business intelligence teams took on the role of data creators, whereas previously they had been only customers. Now that both teams were playing both creator and customer, leadership established communication channels between the two. The local BI teams came to understand who their customers were, how data was used, and the impacts of failing to meet customer requirements. Further, the BI teams could remediate issues much more quickly and begin to sort out their root causes. Quality improved quickly — but remained confined within the OPSBI team.

In 2020, the OPSBI team and Kinda conducted a companywide survey that revealed the high costs of managing data quality — results that were broadly consistent with industry cost reports. The survey also highlighted inconsistent reporting, low trust in data, duplicated data-cleaning efforts, and fragmented and siloed data. Among the key findings:

  • Sixty-one percent of respondents reported unreliable data, citing contradictory or incorrect information across systems.
  • Fifty-three percent reported that data was outright incorrect.
  • Nearly 84% of respondents acknowledged relying on workarounds to address these issues.

A Turning Point to Proactive Prevention

The survey results set HelloFresh on the path to adopting proactive prevention throughout the company. The CEO tasked a senior executive, Peter Caron, to transform and integrate the data organization. 

First, a new companywide data quality working group was created to build on the work of the OPSBI team. This group defined the roles and responsibilities of data creators and data customers, established standard data quality dimensions, and set up a standard data quality process designed to establish direct communication channels between data creators and data customers.

Second, a data literacy program helped to foster the data culture that HF needed and to train the data creators and customers. Third, HF transitioned to a data mesh architecture and formalized the data quality responsibilities of data creators. Top-down support from the C-suite helped create a sense of urgency. Finally, HF hired David Castro-Gavino as its first vice president of data, and he focused his team on guidance, literacy, transparent reporting across teams, and a culture of collaboration. In parallel, more business domains added embedded data product owners and data engineers to coordinate with Castro-Gavino’s team and drive quality improvements.

Although significant progress has been made, there is still work to be done, Castro-Gavino said. “Over the last four and a half years, HelloFresh made remarkable progress in transforming how we manage and utilize data,” he said. “Our self-service data platform has fulfilled the promises of data mesh, bridging the gap between data producers and data consumers and enabling us to ingest data in minutes rather than months.” 

Quality, governance, and observability are part of this data platform, which means that the data team is able to certify data assets and build trust and confidence among colleagues. The data team has also modernized the majority of HF’s core data assets, so the KPIs driving business decisions are based on qualified data, Castro-Gavino said.

Organizationally, data pros now sit within the business units. “The maturity of our data product teams has reached a point where we are decentralizing them into domains, embedding them closer to business functions,” Castro-Gavino said.

Looking ahead, “we are now exploring innovative ways to leverage AI and LLMs to implement concepts like self-healing data,” he noted. “This will enable the platform not only to observe and report on data quality dimensions but also to automatically address them.”

The HelloFresh journey illustrates a common path through the three data quality modes: starting with unmanaged data, moving into organized cleanup mode, and ultimately reaching the third mode, proactive prevention. As HF and other companies have found, simply implementing tools or assigning a team to fix data issues is not enough. The breakthrough came when HF underwent an organizational transformation that focused on data creators and data customers, embedded data product owners, and established a core team to oversee the data quality program, set standards, train the data creators, and drive progress. Accountability for data quality clearly resides “in the business.”

Get Started on Your Path to Data Quality

Bad data plagues most companies and, as noted above, does enormous damage. Today’s organizational structures get accountabilities wrong and make it difficult for managers to see the potential in Mode 3. Once leaders see that potential, most pick up the work quickly: Working in proactive mode is both easier and more satisfying than correcting errors.  

The difficulty involves learning about Mode 3 and having the courage to try it out. As we have seen during many consulting engagements on data quality, such initiatives often start with a provocateur who refuses to accept the status quo.

While provocateurs can come from all organizational functions and levels, one common trait they share is the motivation to solve business problems, not improve data quality for its own sake. Many provocateurs wish to run the store better, reduce cost, and/or manage risk. HF’s leaders, for example, didn’t trust the data that was available when they needed to make critical decisions. One company we worked with made data quality measurement part of an executive training session and, embarrassed by the results, saw the need for change. Another company’s motivation was user dissatisfaction with a new computer system that made data quality errors more visible. As we saw at energy company Chevron, at some organizations a provocateur sees the benefits of the proactive approach early and helps the organization leapfrog right from Mode 1 to Mode 3.

Many provocateurs see themselves as data customers, though we have also seen this role played by data creators hoping to improve their competitive position. The bottom line is that a provocateur may be an individual contributor or a business unit leader: No manager, at any level, needs to wait around to improve data quality within their span of influence. 

Leaders who decide to pursue better organization for data quality should follow these five steps.

1. Look for opportunities. Dig into your team’s work to determine how it would benefit from improved data. 

2. Start the conversation. Start talking up Mode 3, and find allies who share your frustration and can support you. Then start small, focusing on a single problem.

3. Collaborate to build the right capabilities. To unlock Mode 3, work with those allies to get creators, customers, and embedded data managers collaborating on the work to address that initial problem, as HelloFresh did. 

4. Scale up. Expand Mode 3, working on additional problems one by one to refine your processes and embed better data habits and a culture of accountability across teams. At this point, one manager can take the data quality effort only so far. As HF’s experience illustrates, scaling up and out will require more senior leaders to buy into the benefits of Mode 3 and get involved. 

5. Scale out. At this stage, senior leaders take the effort to the next team, department, line of business, and so on.

The above advice also holds profound implications for chief data officers, some of whom do not currently focus primarily on data quality but instead prioritize what they believe to be sexier programs involving digital transformation or AI. We see and understand the appeal. But ignoring data quality imperils those efforts while also missing big, close-to-home business opportunities.

Clean and reliable data solves business problems and unlocks efficiencies and strategic options. Yet, as HF learned, achieving this is not just a technical challenge: It’s a culture and people-organization challenge. True and lasting success stems from a change in mindset and a culture where employees recognize themselves as both data customers and data creators. Objectively, many employees are already in both of these roles — they just don’t recognize it. Most enjoy taking ownership of data quality, collaborating, and contributing to the transformation that results.  

Fostering a culture of data quality ownership doesn’t happen overnight. It requires deliberate effort, clear communication, and visible support from leaders. No matter your level in the organization, don’t wait for the perfect conditions: Start where you are and address the challenges within your reach. Those early success stories will multiply and build momentum. As better-quality data enhances decision-making, leaders and team members will realize the value of investing in a strong data culture and making the leap to proactive mode on data quality.