W3C home > Mailing lists > Public > public-dwbp-wg@w3.org > November 2014

Re: ISSUE-80: We need a definition of "dataset"

From: Steven Adler <adler1@us.ibm.com>
Date: Mon, 17 Nov 2014 14:23:43 -0500
To: Laufer <laufer@globo.com>
Cc: Eric Stephan <ericphb@gmail.com>, DWBP WG <public-dwbp-wg@w3.org>
Message-ID: <OFEA98707D.5CCE0AC3-ON85257D93.006A87AC-85257D93.006A8AD4@us.ibm.com>

+1 :)


Best Regards,

Steve

Motto: "Do First, Think, Do it Again"


|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Laufer <laufer@globo.com>                                                                                                                         |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Eric Stephan <ericphb@gmail.com>                                                                                                                  |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Cc:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |DWBP WG <public-dwbp-wg@w3.org>                                                                                                                   |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |11/16/2014 11:53 AM                                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: ISSUE-80: We need a definition of "dataset"                                                                                                   |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|





Ok, Eric.

Laufer

2014-11-15 13:04 GMT-02:00 Eric Stephan <ericphb@gmail.com>:
  Laufer,

  >> I think that this issue is divided in 3 issues:
  >>1 - the DWBP WG definition of dataset;
  >>2 - the DCAT definition of dataset;
  >>3 - the mapping of other data models to DCAT´s data model.

  Thank you for taking the time to outline your thoughts.

  I'm wondering, would it be helpful to resolve this issue if we just
  addressed items 1 and 2 of your concerns?  In other words, How do we has
  a DWBP working group define a dataset (item 1), and is the reuse of DCAT
  definition of a dataset sufficient (item 2)?"

  Once we come up with the DWBP working group definition of a dataset, then
  I think it is appropriate to discuss how how the DWBP dataset needs to
  map to the data models identified in the UCR (item 3).

  If we separate the definition of dataset and data model mapping as two
  separate issues it might help us move forward.

  Eric S



  On Fri, Nov 14, 2014 at 7:00 AM, Laufer <laufer@globo.com> wrote:
   Makx,

   I agree with you that DCAT´s definition is good. The problem I see is if
   with this definition DCAT could express (map) all other definitions
   using the current DCAT data model, including the DCAT definition of
   distribution (we must also define this term). And if our group should
   care if DCAT could do these mappings. As you also pointed, and I agree,
   the issue of inheritance  is also very abroad and has different
   interpretations in different groups, and would be impossible to define
   the "best" inheritance schema.

   When, for example, a user uses a CKAN platform to publish data, the DCAT
   description instance is invisible for her. The CKAN platform will be the
   responsible for generating a DCAT instance that corresponds to the
   datasets and distributions published by the user. The same for other
   publishing/distributions platforms. Could CKAN maps its data model to
   DCAT´s data model?

   I think that this issue is divided in 3 issues:
   1 - the DWBP WG definition of dataset;
   2 - the DCAT definition of dataset;
   3 - the mapping of other data models to DCAT´s data model.

   I agree that to our WG the better would be to not enter in this
   discussion and assume DCAT´s definition and not care about other issues.
   But I don't know if we can leave this thing without stating in our
   documents all this issues of the data on the web ecosystem. The fact,
   for me, is that in this ecosystem we have different definitions of
   dataset with different implementations related to these definitions.

   I think that our suggestions/recommendations of best practices should
   influence the publishing/distribution platforms, in a way that, in some
   sense, could create a common definition of dataset/distribution, maybe
   the DCAT one, or an extended version.

   Best Regards,
   Laufer

   2014-11-14 11:18 GMT-02:00 Makx Dekkers <mail@makxdekkers.com>:

     Ed,

     In my mind, there is nothing that would prevent people to use DCAT for
     a
     collection of unrelated data, and I don't think we want to tell them
     they can't. Also, it would depend on someone's perspective on what
     constitutes 'related'.

     Again, my position is that the definition of dataset in DCAT is good
     enough, and that we should not spend time in trying to make it better.
     (http://www.brainyquote.com/quotes/quotes/v/voltaire109643.html)

     Makx.



     > -----Original Message-----
     > From: Ed Staub [mailto:ed.staub@semanterra.org] On Behalf Of Ed
     Staub
     > Sent: Thursday, November 13, 2014 5:11 AM
     > To: public-dwbp-wg@w3.org
     > Subject: Re: ISSUE-80: We need a definition of "dataset"
     >
     > Note that the RDF Data Cube vocabulary has a different definition of
     > "dataset" than DCAT:
     >
     > "Represents a collection of observations, possibly organized into
     > various
     > slices, conforming to some common dimensional structure."
     >
     > Assuming the DCAT definition is used, I think it useful to make
     clear
     > that a
     > "common dimensional structure" is not implied.  FWIW, my prior
     > experience
     > led me to assume the "common dimensional structure" meaning for DCAT
     > until I
     > dug into the DCAT spec.
     >
     >
     > On the "too-broad" side, there probably are collections of data
     > published or
     > curated by a single agent that are larger than is intended by this
     > definition.  In particular, I agree with Bernadette Lóscio in
     thinking
     > that
     > the collection's content should be related - not "a random
     assortment
     > of
     > data".  As an extreme example, imagine the entire content of
     > datahub.io
     > described as a single dataset!
     >
     >
     > So... I'd suggest adding the word "related":
     >
     > "A related collection of data, published or curated by a single
     agent,
     >    ^^^^^^^
     > and available for access or download in one or more formats."
     >
     > The addition of "related" deals with both concerns at once; it would
     > be
     > strange and tautological to require all the data in a single cube to
     > be
     > "related".
     >
     >
     > -Ed Staub
     >
     >






   --
   .  .  .  .. .  .
   .        .   . ..
   .     ..       .




--
.  .  .  .. .  .
.        .   . ..
.     ..       .





graycol.gif
(image/gif attachment: graycol.gif)

ecblank.gif
(image/gif attachment: ecblank.gif)

Received on Monday, 17 November 2014 19:24:29 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:24:18 UTC