Re: ISSUE-80: We need a definition of "dataset" from Laufer on 2014-11-16 (public-dwbp-wg@w3.org from November 2014)

From: Laufer <laufer@globo.com>
Date: Sun, 16 Nov 2014 14:52:30 -0200
To: Eric Stephan <ericphb@gmail.com>
Cc: DWBP WG <public-dwbp-wg@w3.org>
Message-ID: <CA+pXJiiM6YwSFGu2zY41B7mfdHFOq5T0ojYv4P6cYjuasWAUTA@mail.gmail.com>
Ok, Eric.

Laufer

2014-11-15 13:04 GMT-02:00 Eric Stephan <ericphb@gmail.com>:

> Laufer,
>
> >> I think that this issue is divided in 3 issues:
> >>1 - the DWBP WG definition of dataset;
> >>2 - the DCAT definition of dataset;
> >>3 - the mapping of other data models to DCAT´s data model.
>
> Thank you for taking the time to outline your thoughts.
>
> I'm wondering, would it be helpful to resolve this issue if we just
> addressed items 1 and 2 of your concerns?  In other words, How do we has a
> DWBP working group define a dataset (item 1), and is the reuse of DCAT
> definition of a dataset sufficient (item 2)?"
>
> Once we come up with the DWBP working group definition of a dataset, then
> I think it is appropriate to discuss how how the DWBP dataset needs to map
> to the data models identified in the UCR (item 3).
>
> If we separate the definition of dataset and data model mapping as two
> separate issues it might help us move forward.
>
> Eric S
>
>
>
> On Fri, Nov 14, 2014 at 7:00 AM, Laufer <laufer@globo.com> wrote:
>
>> Makx,
>>
>> I agree with you that DCAT´s definition is good. The problem I see is if
>> with this definition DCAT could express (map) all other definitions using
>> the current DCAT data model, including the DCAT definition of distribution
>> (we must also define this term). And if our group should care if DCAT could
>> do these mappings. As you also pointed, and I agree, the issue of
>> inheritance  is also very abroad and has different interpretations in
>> different groups, and would be impossible to define the "best" inheritance
>> schema.
>>
>> When, for example, a user uses a CKAN platform to publish data, the DCAT
>> description instance is invisible for her. The CKAN platform will be the
>> responsible for generating a DCAT instance that corresponds to the datasets
>> and distributions published by the user. The same for other
>> publishing/distributions platforms. Could CKAN maps its data model to
>> DCAT´s data model?
>>
>> I think that this issue is divided in 3 issues:
>> 1 - the DWBP WG definition of dataset;
>> 2 - the DCAT definition of dataset;
>> 3 - the mapping of other data models to DCAT´s data model.
>>
>> I agree that to our WG the better would be to not enter in this
>> discussion and assume DCAT´s definition and not care about other issues.
>> But I don't know if we can leave this thing without stating in our
>> documents all this issues of the data on the web ecosystem. The fact, for
>> me, is that in this ecosystem we have different definitions of dataset with
>> different implementations related to these definitions.
>>
>> I think that our suggestions/recommendations of best practices should
>> influence the publishing/distribution platforms, in a way that, in some
>> sense, could create a common definition of dataset/distribution, maybe the
>> DCAT one, or an extended version.
>>
>> Best Regards,
>> Laufer
>>
>> 2014-11-14 11:18 GMT-02:00 Makx Dekkers <mail@makxdekkers.com>:
>>
>> Ed,
>>>
>>> In my mind, there is nothing that would prevent people to use DCAT for a
>>> collection of unrelated data, and I don't think we want to tell them
>>> they can't. Also, it would depend on someone's perspective on what
>>> constitutes 'related'.
>>>
>>> Again, my position is that the definition of dataset in DCAT is good
>>> enough, and that we should not spend time in trying to make it better.
>>> (http://www.brainyquote.com/quotes/quotes/v/voltaire109643.html)
>>>
>>> Makx.
>>>
>>>
>>>
>>> > -----Original Message-----
>>> > From: Ed Staub [mailto:ed.staub@semanterra.org] On Behalf Of Ed Staub
>>> > Sent: Thursday, November 13, 2014 5:11 AM
>>> > To: public-dwbp-wg@w3.org
>>> > Subject: Re: ISSUE-80: We need a definition of "dataset"
>>> >
>>> > Note that the RDF Data Cube vocabulary has a different definition of
>>> > "dataset" than DCAT:
>>> >
>>> > "Represents a collection of observations, possibly organized into
>>> > various
>>> > slices, conforming to some common dimensional structure."
>>> >
>>> > Assuming the DCAT definition is used, I think it useful to make clear
>>> > that a
>>> > "common dimensional structure" is not implied.  FWIW, my prior
>>> > experience
>>> > led me to assume the "common dimensional structure" meaning for DCAT
>>> > until I
>>> > dug into the DCAT spec.
>>> >
>>> >
>>> > On the "too-broad" side, there probably are collections of data
>>> > published or
>>> > curated by a single agent that are larger than is intended by this
>>> > definition.  In particular, I agree with Bernadette Lóscio in thinking
>>> > that
>>> > the collection's content should be related - not "a random assortment
>>> > of
>>> > data".  As an extreme example, imagine the entire content of
>>> > datahub.io
>>> > described as a single dataset!
>>> >
>>> >
>>> > So... I'd suggest adding the word "related":
>>> >
>>> > "A related collection of data, published or curated by a single agent,
>>> >    ^^^^^^^
>>> > and available for access or download in one or more formats."
>>> >
>>> > The addition of "related" deals with both concerns at once; it would
>>> > be
>>> > strange and tautological to require all the data in a single cube to
>>> > be
>>> > "related".
>>> >
>>> >
>>> > -Ed Staub
>>> >
>>> >
>>>
>>>
>>>
>>>
>>
>>
>> --
>> .  .  .  .. .  .
>> .        .   . ..
>> .     ..       .
>>
>
>


-- 
.  .  .  .. .  .
.        .   . ..
.     ..       .
Received on Sunday, 16 November 2014 16:52:59 UTC