Re: data quality vocabulary - scope

I've always thought of the DQV as existing to deal with the issues around quality that the BP doc can't address. That is, it would be unhelpful to state in a best practice that data should be of high quality. There are times when publishing lower quality data is intentionally done for an audience that is prepared for that. (I'm thinking here of preliminary data releases, where getting *something* out is more important than getting high-quality data out.) Moreover, one person's low quality can be another person's high quality. The whole reason that I think the DQV idea is interesting is because it can begin to address the more difficult-to-measure aspects of a dataset that help a potential user determine whether the dataset is likely to meet their needs. I think the BP doc should define its own measures for the things it can cover--the things we already have vocabulary to discuss there.
-Annette

On Jun 2, 2015, at 3:24 PM, Antoine Isaac <aisaac@few.vu.nl> wrote:

> Hi Annette, everyone,
> 
> Trying to reflect the broader range of some of Annette's comments...
> 
> 
>> (Credibility [could be a statement], as the decision of who is authoritative is subjective, and I wouldn't want to rate a good dataset poorly because it was put out by a small business.)
>> 
> 
> 
> Some may disagree with this, which leads to the following point.
> 
> 
>> I think we also need to think about how this vocabulary is expected to be used. If a data publisher provides quality information upon publication (which is what I've been thinking of as the main use of the vocabulary), then some items won't really make sense to include. Information like why the data was removed from the web will not be available when it is first published.
> 
> 
> DQV is meant to be used by the data publisher, but also by anyone who could have a say on the quality, after publication.
> If no one has a strong objection about this scope, and the DQV draft is not clear about it, then we should make it clearer.
> 
> Note that this is the main reason why there's potential overlap with DUV (ie., for feedback). And why we're keen to have provenance of quality MD (so that someone could judge the credibility of a credibility statement).
> 
> 
> 
>> I worry, too, that we are defining some stuff that really isn't about data quality so much as the best practices that we have in the BP doc. I'm thinking here of the mentions of machine readability and metadata.
>> We may need to do some scoping to be sure we are targeting quality information. I would suggest that we avoid repeating what is in the BP doc.
> 
> 
> (I am not sure that I'm fully understanding your worry here, but I'll try my interpretation...)
> 
> I have repeatedly asked for such scoping discussion, since we gathered all these preparatory wiki pages of the last F2F.
> Especially, whether DQV should include (or even focus) on quality criteria that would be defined by the group as part of the BPs.
> 
> I think there was never a clear answer. Well, it seemed to me that at one point people seemed to like the idea of using DQV to measure how well dataset comply with our BPs. But maybe this is a personal interpretation of discussions I was not physically in.
> 
> Personally I'm a bit lukewarm on the idea. Even though I must admit that defining the "star system" (that has been mentioned a couple of times before) using DQV looks like an interesting exercise of eating one's own dog food.
> 
> What do you think?
> 
> Antoine
> 

Received on Wednesday, 3 June 2015 01:20:19 UTC