Re: Brainstorming data quality needs within the Dataset Usage Vocabulary. from Eric Stephan on 2015-04-17 (public-dwbp-wg@w3.org from April 2015)

From: Eric Stephan <ericphb@gmail.com>
Date: Fri, 17 Apr 2015 05:03:38 -0700
To: "Debattista, Jeremy" <Jeremy.Debattista@iais.fraunhofer.de>
Cc: Antoine Isaac <aisaac@few.vu.nl>, Public DWBP WG <public-dwbp-wg@w3.org>
Message-ID: <CAMFz4jh8aKnU_+gF5D8jHGvO2F7fN6RMWUrmcY_HgiAq+rx1eQ@mail.gmail.com>
Jeremy,

Thanks for your thoughts, very interesting links.  From the perspective of
the Data Activity I believe DCAT:Dataset was always intended to be used to
some degree.  From a data modeling perspective  the definition DCAT:Dataset
was attractive because it provided us the definition needed to declare a
logical container for collections of data represented in a variety of forms.

It wasn't so much as excluding the other definitions as picking one that
would be useful to the model and is in line with the Data Activity.  Having
said all of this I'm wondering what you know about the equivalencies of the
dcat:Dataset definition to the others such as void:Dataset.  Are these
other concepts dataset all the same thing?  If so it would be nice to
provide in some way an equivalency mapping to these existing Dataset
concepts.  If the Dataset concepts are named the same but defined
differently I think we have a different argument.

What do you think?

Cheers,

Eric

On Thu, Apr 16, 2015 at 11:48 PM, Debattista, Jeremy <
Jeremy.Debattista@iais.fraunhofer.de> wrote:

>  Hi Eric, Antoine,
>
>
>  I am just wondering, why only "dcat:Dataset". I think in that way an
> existing (large) number of datasets that are not represented by
> dcat, will be "excluded", am I right? For example in [1] they say that
> around 14% of the crawled datasets use void (where there is an equivalent
> void:Dataset [2]) whilst only around 6% use dcat.
>
>
>  Apologies if I got this one wrong, but this is really confusing for me.
>
>
>  Cheers,
>
> Jeremy
>
>
>  [1] http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/#toc6
>
> [2] http://www.w3.org/TR/void/#dataset
>  ------------------------------
> *From:* Eric Stephan <ericphb@gmail.com>
> *Sent:* Friday, April 17, 2015 12:14 AM
> *To:* Antoine Isaac
> *Cc:* Public DWBP WG
> *Subject:* Re: Brainstorming data quality needs within the Dataset Usage
> Vocabulary.
>
>  Hi Antoine,
>
>  I was thinking that the data quality vocabulary would support objective
> metrics, subjective metrics, and qualified opinions on the dataset as well.
>
>  Proposed example objective metrics on dataset:
> * Consumer DCAT:Dataset usage should be tracked by metrics such as a
> counter.
>
>  Proposed example subjective metrics on dataset:
> * Metrics should be used to rate consumer acceptability for a DCAT:Dataset.
>
>  Proposed example qualified opinions:
> * Metrics should be used to rate qualifications of consumer providing
> opinions about a DCAT:Dataset
>
>  I just want to make sure I'm not doing anything that will conflict or be
> redundant with the data quality efforts.
>
>  Thanks,
>
> Eric S
>
>
>
> On Thu, Apr 16, 2015 at 1:56 PM, Antoine Isaac <aisaac@few.vu.nl> wrote:
>
>> Hi Eric,
>>
>> This is quite an interesting discussion...
>>
>> My two cents would be that the following would be in scope for the
>> Quality vocabulary
>> "Consumers should be able to provide feedback on overall DCAT:Dataset
>> quality"
>>
>> The rest would be out-of-scope. Maybe from the perspective of the data
>> usage vocabulary it make sense to further qualify the 'raters', in the case
>> they would be also data users.
>>
>> >From the perspective of quality voc that's just a no-go. We're fighting
>> to get concrete requirements for a framework for representing quality of
>> datasets, we shouldn't embark on getting a framework to represent the
>> quality of people.
>>
>> Cheers,
>>
>> Antoine
>>
>>
>> On 4/16/15 10:47 PM, Eric Stephan wrote:
>>
>>> Question:  Does the following help clarify/confuse the quality needs
>>> from the Dataset Usage Vocabulary perspective?
>>>
>>>
>>> I'm not sure if anyone from the Data Quality Vocabulary was on the call
>>> on the second day of F2F3 when we discussed the Data Usage Vocabulary topic.
>>>
>>>
>>> I think there were some important points that were made:
>>>
>>> 1) The "Data Usage Vocabulary" had been changed to the "Dataset Usage
>>> Vocabulary".
>>>
>>>
>>> 2) This name change was done with the intention to focus our efforts on
>>> providing a vocabulary at the DCAT:Dataset level only.
>>>
>>>
>>> 3) By focusing efforts at the DCAT:Dataset level it allowed us to avoid
>>> the seemingly endless are we talking data or dataset discussions.  Most
>>> importantly it allowed us to talk about DCAT:Dataset has being a logical
>>> container for a "set of data".
>>>
>>>
>>>
>>> My hope is that this might simply how we build bridges between the
>>> Dataset Usage Vocabulary and the Data Quality Vocabulary.
>>>
>>>
>>> Below are listed some tangible minimal quality related requirements for
>>> DCAT:Dataset Usage:
>>>
>>>
>>> * Consumer DCAT:Dataset usage should be tracked by metrics such as a
>>> counter.
>>>
>>> * Consumers should be able to provide feedback on overall DCAT:Dataset
>>> quality
>>>
>>> * Consumers should be rated for their qualifications when commenting on
>>> DCAT:Dataset.
>>>
>>> * Metrics should be used to rate consumer acceptability for a
>>> DCAT:Dataset.
>>>
>>>               -   Boolean metrics should be used to indicate overall
>>> approval/disapproval.
>>>
>>>               -  Integer scale metrics should be used to indicate levels
>>> of acceptability.
>>>
>>> * Metrics should be used to rate qualifications of consumer providing
>>> opinions about a DCAT:Dataset
>>>
>>>
>>> I welcome more ideas for requirements, but these requirements do
>>> illustrate the data quality needs of the Dataset Usage Vocabulary.
>>>
>>>
>>> I look forward to your thoughts and ideas,
>>>
>>>
>>> Eric S
>>>
>>>
>>>
>>>
>>
>
Received on Friday, 17 April 2015 12:04:18 UTC