Re: data quality vocabulary - dimensions

Hi Annette,

On 5/29/15 8:49 AM, Annette Greiner wrote:
> I think that the accuracy/consistency/relevance metric needs to be broken up into separate items. A dataset could rank well in one of those three and also rank poorly in another; they are pretty orthogonal. In particular, I think relevance can't be a ranking but rather a statement of who the data is relevant for. In that respect, it may not really be a measure of quality.

Thanks for the comment!

It came from the minutes of the Share-PSI workshop [1], where it was said:
Q: (Lorenzo) consistency could be inside accuracy
A: (Makx) this might depend on context, there is some overlap.
a cluster including accuracy, consistency and relevance

Relevance may also be defined as a metrics. For example I expect coverage of a list of topics/places in a dataset could be defined in terms of percentage of a reference list of places.

Anyway, I agree lumping them seems a bit premature. So I've splitted them.

> I see consistency as crucial. That is really what I want as a user of data. What I want to know is, can I use it readily in an analysis tool? Can I open the dataset in R and do some statistical manipulations? Can I open it in Tableau and make a visualization without doing a lot of cleaning?

These questions go in the draft, as I don't want to lose them :)

The rest will go in another email....




Received on Tuesday, 2 June 2015 22:12:13 UTC