Re: ISSUE-80: We need a definition of "dataset" from Laufer on 2014-11-14 (public-dwbp-wg@w3.org from November 2014)

From: Laufer <laufer@globo.com>
Date: Fri, 14 Nov 2014 13:00:53 -0200
To: Makx Dekkers <mail@makxdekkers.com>
Cc: Ed Staub <estaub2@comcast.net>, DWBP WG <public-dwbp-wg@w3.org>
Message-ID: <CA+pXJij0SsKjfRupxVNFrQOHijpLRDRJBmHUL-_nT0fYbxp4tw@mail.gmail.com>
Makx,

I agree with you that DCAT´s definition is good. The problem I see is if
with this definition DCAT could express (map) all other definitions using
the current DCAT data model, including the DCAT definition of distribution
(we must also define this term). And if our group should care if DCAT could
do these mappings. As you also pointed, and I agree, the issue of
inheritance  is also very abroad and has different interpretations in
different groups, and would be impossible to define the "best" inheritance
schema.

When, for example, a user uses a CKAN platform to publish data, the DCAT
description instance is invisible for her. The CKAN platform will be the
responsible for generating a DCAT instance that corresponds to the datasets
and distributions published by the user. The same for other
publishing/distributions platforms. Could CKAN maps its data model to
DCAT´s data model?

I think that this issue is divided in 3 issues:
1 - the DWBP WG definition of dataset;
2 - the DCAT definition of dataset;
3 - the mapping of other data models to DCAT´s data model.

I agree that to our WG the better would be to not enter in this discussion
and assume DCAT´s definition and not care about other issues. But I don't
know if we can leave this thing without stating in our documents all this
issues of the data on the web ecosystem. The fact, for me, is that in this
ecosystem we have different definitions of dataset with different
implementations related to these definitions.

I think that our suggestions/recommendations of best practices should
influence the publishing/distribution platforms, in a way that, in some
sense, could create a common definition of dataset/distribution, maybe the
DCAT one, or an extended version.

Best Regards,
Laufer

2014-11-14 11:18 GMT-02:00 Makx Dekkers <mail@makxdekkers.com>:

> Ed,
>
> In my mind, there is nothing that would prevent people to use DCAT for a
> collection of unrelated data, and I don't think we want to tell them
> they can't. Also, it would depend on someone's perspective on what
> constitutes 'related'.
>
> Again, my position is that the definition of dataset in DCAT is good
> enough, and that we should not spend time in trying to make it better.
> (http://www.brainyquote.com/quotes/quotes/v/voltaire109643.html)
>
> Makx.
>
>
>
> > -----Original Message-----
> > From: Ed Staub [mailto:ed.staub@semanterra.org] On Behalf Of Ed Staub
> > Sent: Thursday, November 13, 2014 5:11 AM
> > To: public-dwbp-wg@w3.org
> > Subject: Re: ISSUE-80: We need a definition of "dataset"
> >
> > Note that the RDF Data Cube vocabulary has a different definition of
> > "dataset" than DCAT:
> >
> > "Represents a collection of observations, possibly organized into
> > various
> > slices, conforming to some common dimensional structure."
> >
> > Assuming the DCAT definition is used, I think it useful to make clear
> > that a
> > "common dimensional structure" is not implied.  FWIW, my prior
> > experience
> > led me to assume the "common dimensional structure" meaning for DCAT
> > until I
> > dug into the DCAT spec.
> >
> >
> > On the "too-broad" side, there probably are collections of data
> > published or
> > curated by a single agent that are larger than is intended by this
> > definition.  In particular, I agree with Bernadette Lóscio in thinking
> > that
> > the collection's content should be related - not "a random assortment
> > of
> > data".  As an extreme example, imagine the entire content of
> > datahub.io
> > described as a single dataset!
> >
> >
> > So... I'd suggest adding the word "related":
> >
> > "A related collection of data, published or curated by a single agent,
> >    ^^^^^^^
> > and available for access or download in one or more formats."
> >
> > The addition of "related" deals with both concerns at once; it would
> > be
> > strange and tautological to require all the data in a single cube to
> > be
> > "related".
> >
> >
> > -Ed Staub
> >
> >
>
>
>
>


-- 
.  .  .  .. .  .
.        .   . ..
.     ..       .
Received on Friday, 14 November 2014 15:01:22 UTC