Re: ISSUE-80: We need a definition of "dataset" from Eric Stephan on 2014-11-15 (public-dwbp-wg@w3.org from November 2014)

From: Eric Stephan <ericphb@gmail.com>
Date: Sat, 15 Nov 2014 07:04:13 -0800
To: Laufer <laufer@globo.com>
Cc: Makx Dekkers <mail@makxdekkers.com>, Ed Staub <estaub2@comcast.net>, DWBP WG <public-dwbp-wg@w3.org>
Message-ID: <CAMFz4jjm2CiYSBXq0HJNxHDGh5jg7DiMoZdOkHRSXRwqhUD8Rg@mail.gmail.com>
Laufer,

>> I think that this issue is divided in 3 issues:
>>1 - the DWBP WG definition of dataset;
>>2 - the DCAT definition of dataset;
>>3 - the mapping of other data models to DCAT´s data model.

Thank you for taking the time to outline your thoughts.

I'm wondering, would it be helpful to resolve this issue if we just
addressed items 1 and 2 of your concerns?  In other words, How do we has a
DWBP working group define a dataset (item 1), and is the reuse of DCAT
definition of a dataset sufficient (item 2)?"

Once we come up with the DWBP working group definition of a dataset, then I
think it is appropriate to discuss how how the DWBP dataset needs to map to
the data models identified in the UCR (item 3).

If we separate the definition of dataset and data model mapping as two
separate issues it might help us move forward.

Eric S



On Fri, Nov 14, 2014 at 7:00 AM, Laufer <laufer@globo.com> wrote:

> Makx,
>
> I agree with you that DCAT´s definition is good. The problem I see is if
> with this definition DCAT could express (map) all other definitions using
> the current DCAT data model, including the DCAT definition of distribution
> (we must also define this term). And if our group should care if DCAT could
> do these mappings. As you also pointed, and I agree, the issue of
> inheritance  is also very abroad and has different interpretations in
> different groups, and would be impossible to define the "best" inheritance
> schema.
>
> When, for example, a user uses a CKAN platform to publish data, the DCAT
> description instance is invisible for her. The CKAN platform will be the
> responsible for generating a DCAT instance that corresponds to the datasets
> and distributions published by the user. The same for other
> publishing/distributions platforms. Could CKAN maps its data model to
> DCAT´s data model?
>
> I think that this issue is divided in 3 issues:
> 1 - the DWBP WG definition of dataset;
> 2 - the DCAT definition of dataset;
> 3 - the mapping of other data models to DCAT´s data model.
>
> I agree that to our WG the better would be to not enter in this discussion
> and assume DCAT´s definition and not care about other issues. But I don't
> know if we can leave this thing without stating in our documents all this
> issues of the data on the web ecosystem. The fact, for me, is that in this
> ecosystem we have different definitions of dataset with different
> implementations related to these definitions.
>
> I think that our suggestions/recommendations of best practices should
> influence the publishing/distribution platforms, in a way that, in some
> sense, could create a common definition of dataset/distribution, maybe the
> DCAT one, or an extended version.
>
> Best Regards,
> Laufer
>
> 2014-11-14 11:18 GMT-02:00 Makx Dekkers <mail@makxdekkers.com>:
>
> Ed,
>>
>> In my mind, there is nothing that would prevent people to use DCAT for a
>> collection of unrelated data, and I don't think we want to tell them
>> they can't. Also, it would depend on someone's perspective on what
>> constitutes 'related'.
>>
>> Again, my position is that the definition of dataset in DCAT is good
>> enough, and that we should not spend time in trying to make it better.
>> (http://www.brainyquote.com/quotes/quotes/v/voltaire109643.html)
>>
>> Makx.
>>
>>
>>
>> > -----Original Message-----
>> > From: Ed Staub [mailto:ed.staub@semanterra.org] On Behalf Of Ed Staub
>> > Sent: Thursday, November 13, 2014 5:11 AM
>> > To: public-dwbp-wg@w3.org
>> > Subject: Re: ISSUE-80: We need a definition of "dataset"
>> >
>> > Note that the RDF Data Cube vocabulary has a different definition of
>> > "dataset" than DCAT:
>> >
>> > "Represents a collection of observations, possibly organized into
>> > various
>> > slices, conforming to some common dimensional structure."
>> >
>> > Assuming the DCAT definition is used, I think it useful to make clear
>> > that a
>> > "common dimensional structure" is not implied.  FWIW, my prior
>> > experience
>> > led me to assume the "common dimensional structure" meaning for DCAT
>> > until I
>> > dug into the DCAT spec.
>> >
>> >
>> > On the "too-broad" side, there probably are collections of data
>> > published or
>> > curated by a single agent that are larger than is intended by this
>> > definition.  In particular, I agree with Bernadette Lóscio in thinking
>> > that
>> > the collection's content should be related - not "a random assortment
>> > of
>> > data".  As an extreme example, imagine the entire content of
>> > datahub.io
>> > described as a single dataset!
>> >
>> >
>> > So... I'd suggest adding the word "related":
>> >
>> > "A related collection of data, published or curated by a single agent,
>> >    ^^^^^^^
>> > and available for access or download in one or more formats."
>> >
>> > The addition of "related" deals with both concerns at once; it would
>> > be
>> > strange and tautological to require all the data in a single cube to
>> > be
>> > "related".
>> >
>> >
>> > -Ed Staub
>> >
>> >
>>
>>
>>
>>
>
>
> --
> .  .  .  .. .  .
> .        .   . ..
> .     ..       .
>
Received on Saturday, 15 November 2014 15:04:45 UTC