Skip to content

Data Quality through Visualizing

March 18, 2011

An interesting addendum to my last post about a tag cloud:

I knew I was on the right path when a dramatic shift happened. I had a data set, it was rendering on screen with a few different parameters and layouts, I was iterating quietly.

When I finally got the data hooked up to the triangle layout that Goldie had envisioned, a trend was immediately clear. The sample data was all clumping into a small number of regions instead of using the full space. I looked back at the data itself and it was true! I had mindlessly generated data with subconscious trends in it! It didn’t take long to get the data cleaned up. Because I had the tool working, I could also get instantaneous feedback that it was working.

When it comes to any kind of data, measuring the quality of the data and tracking outliers is so much easier when you have the right tools in place. The scientific hypothesis process can also be applied to data quality assurance, like when querying on bug databases to find underlying root causes and trouble spots. Then, in turn, the ability to immediate grok trends is a measure of feedback about the quality of the data visualization tool itself.

One Comment
  1. Goldie permalink

    We’re gonna get to see a before and after, right? 🙂

Leave a comment