Re: DQV ISSUE-187 - cardinality of the link between Metric and Dimension from Annette Greiner on 2015-09-14 (public-dwbp-wg@w3.org from September 2015)

From: Annette Greiner <amgreiner@lbl.gov>
Date: Mon, 14 Sep 2015 14:11:49 -0700
To: Antoine Isaac <aisaac@few.vu.nl>
Cc: "Debattista, Jeremy" <Jeremy.Debattista@iais.fraunhofer.de>, Public DWBP WG <public-dwbp-wg@w3.org>
Message-Id: <11FE0042-FB2C-4196-BA9E-721B00F84E68@lbl.gov>
My own opinion on whether to include specific dimensions is that it would be helpful for publishers and consumers to do so. I think it’s an important goal to make it relatively easy to use the vocabulary to indicate usage, and offering multiple options that are defined elsewhere makes that more difficult. If I’m posting some data on the web, and wanting to indicate information about that usage, I am much more likely to use they vocabulary if there is a straightforward way of doing that, and if it’s explained in one place. That means that all the information necessary for me to mark up my usage of the data would need to be in the doc. I think the DUV doc should provide a primer on how to use the vocabulary, with examples. As for publishers, the more specific our recommendation is, the more easily they will be able to find instances of their own data being used.
It just occurs to me that it might be good to have a chat with search engine devs about how some of this vocabulary might be surfaced by a search engine.
-Annette
--
Annette Greiner
NERSC Data and Analytics Services
Lawrence Berkeley National Laboratory
510-495-2935

On Sep 13, 2015, at 10:12 AM, Antoine Isaac <aisaac@few.vu.nl> wrote:

> Hi Jeremy,
> 
> Thanks a lot for the quick reaction. This is really helpful!
> 
> I understand the will to rationalize the expression of quality metadata. So whatever happen, I'm really keep to take your suggestion onboard, about giving strong guidelines to publishers of quality metadata. We could recommend that publishers of quality metadata (and the designers of quality metadata frameworks instantiating our DQV) SHOULD aim at creating the least ambiguous descriptions, and thus seek to assign only one dimension to a metric.
> 
> However I am extremely reluctant to go more formal about it, because:
> 
> 1. I am not certain we can rule out the possibility of frameworks that are made of non-disjoint dimensions but are nonetheless very useful. For example if a SPARQL endpoint has terrible uptime (say, only 50%), I'd be tempted to say that this is both a performance and an availability issue.
> Nandana said he had thought of examples, I hope he can provide some.
> 
> 2. Our DQV doesn't define specific metrics and dimensions. So it's likely that several frameworks would emerge, with dimensions that could be 'overlapping'. Say, where one defines 'performance',  'availability' and 'licensing' as three dimensions, another defines only a vaguer dimension of 'access'. Still the one framework could have metrics that are useful from the perspective of the other. If we allow a metric to be assigned different dimensions so that they can serve different frameworks, it would greatly enhance interoperability, especially the re-use of quality assessments across frameworks.
> 
> I'm curious to hear what's everyone opinion about it!
> 
> Cheers,
> 
> Antoine
> 
> On 9/11/15 9:16 PM, Debattista, Jeremy wrote:
>> Hi Antoine,
>> 
>> Whilst I agree that a metric can be in one or more dimensions (similarly for dimensions), I’m inclined to favour otherwise. In a paper we are submitting about daQ (which I can share later), I wrote the following wrt the restrictions:
>> 
>> ———
>> 
>> Whilst we acknowledge that a metric might be perceived in one or more categories by different parties, the main goal of the daQ model is to create a generic schema that allows the semantification of quality measures in the abstract three level hierarchy. When defining these quality measures, a common understanding is required (the unified view presented by the survey in [1] is a good example for this), such that these measures can be used by any framework without any prejudice. If such restrictions were not in place, /doubt/ and /ambiguity/ might be created between the consumers and the quality assessors (who could be a data enthusiast or the publisher himself). One fundamental aspect of assessing a dataset for its quality is to start eliminating the doubts consumers may have about the data. Therefore, if the same metric has multiple definitions and categorisations, incertitude is created for the consumer. On the other hand, the quality assessor ends up in an ambiguous
>> situation to try to identify the right quality measure structure.
>> 
>> [1] http://www.semantic-web-journal.net/content/quality-assessment-linked-data-survey
>> 
>> ———
>> 
>> As I said, this is just my opinion on this subject. Therefore, if you think that this restriction is too much for data publishers, it should be offered as a guideline. The issue is that in my opinion data quality will impact more data consumers than the publishers themselves. For publishers data quality will act as an indicator of the quality of their dataset, whilst for consumers data quality will help them choose one dataset over the other.
>> Imagine the scenario that a Metric X could be categorised in dimension A or B. If one dataset is using X that is in dimension A whilst another dataset is using X in dimension B, it will make it hard to compare the two datasets based on the same metric X.
>> 
>> In this paper we formalise these restrictions in DL, but they are not formally defined in the schema.
>> 
>> I hope my opinion helps a bit :)
>> 
>> Cheers,
>> Jer
>> 
>> On 11 Sep 2015, at 18:01, Antoine Isaac <aisaac@few.vu.nl <mailto:aisaac@few.vu.nl>> wrote:
>> 
>>> Hi Jeremy,
>>> 
>>> Today we had a call during which we discussed ISSUE-187:
>>> https://www.w3.org/2013/dwbp/track/issues/187
>>> I've tried to capture the gist of the discussion in the notes attached
>>> It would be great to have your opinion about it!
>>> 
>>> Cheers,
>>> Antoine
Received on Monday, 14 September 2015 21:12:57 UTC