- From: Steven Adler <adler1@us.ibm.com>
- Date: Mon, 17 Nov 2014 14:23:43 -0500
- To: Laufer <laufer@globo.com>
- Cc: Eric Stephan <ericphb@gmail.com>, DWBP WG <public-dwbp-wg@w3.org>
- Message-ID: <OFEA98707D.5CCE0AC3-ON85257D93.006A87AC-85257D93.006A8AD4@us.ibm.com>
+1 :)
Best Regards,
Steve
Motto: "Do First, Think, Do it Again"
|------------>
| From: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|Laufer <laufer@globo.com> |
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|Eric Stephan <ericphb@gmail.com> |
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Cc: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|DWBP WG <public-dwbp-wg@w3.org> |
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|11/16/2014 11:53 AM |
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject: |
|------------>
>--------------------------------------------------------------------------------------------------------------------------------------------------|
|Re: ISSUE-80: We need a definition of "dataset" |
>--------------------------------------------------------------------------------------------------------------------------------------------------|
Ok, Eric.
Laufer
2014-11-15 13:04 GMT-02:00 Eric Stephan <ericphb@gmail.com>:
Laufer,
>> I think that this issue is divided in 3 issues:
>>1 - the DWBP WG definition of dataset;
>>2 - the DCAT definition of dataset;
>>3 - the mapping of other data models to DCAT´s data model.
Thank you for taking the time to outline your thoughts.
I'm wondering, would it be helpful to resolve this issue if we just
addressed items 1 and 2 of your concerns? In other words, How do we has
a DWBP working group define a dataset (item 1), and is the reuse of DCAT
definition of a dataset sufficient (item 2)?"
Once we come up with the DWBP working group definition of a dataset, then
I think it is appropriate to discuss how how the DWBP dataset needs to
map to the data models identified in the UCR (item 3).
If we separate the definition of dataset and data model mapping as two
separate issues it might help us move forward.
Eric S
On Fri, Nov 14, 2014 at 7:00 AM, Laufer <laufer@globo.com> wrote:
Makx,
I agree with you that DCAT´s definition is good. The problem I see is if
with this definition DCAT could express (map) all other definitions
using the current DCAT data model, including the DCAT definition of
distribution (we must also define this term). And if our group should
care if DCAT could do these mappings. As you also pointed, and I agree,
the issue of inheritance is also very abroad and has different
interpretations in different groups, and would be impossible to define
the "best" inheritance schema.
When, for example, a user uses a CKAN platform to publish data, the DCAT
description instance is invisible for her. The CKAN platform will be the
responsible for generating a DCAT instance that corresponds to the
datasets and distributions published by the user. The same for other
publishing/distributions platforms. Could CKAN maps its data model to
DCAT´s data model?
I think that this issue is divided in 3 issues:
1 - the DWBP WG definition of dataset;
2 - the DCAT definition of dataset;
3 - the mapping of other data models to DCAT´s data model.
I agree that to our WG the better would be to not enter in this
discussion and assume DCAT´s definition and not care about other issues.
But I don't know if we can leave this thing without stating in our
documents all this issues of the data on the web ecosystem. The fact,
for me, is that in this ecosystem we have different definitions of
dataset with different implementations related to these definitions.
I think that our suggestions/recommendations of best practices should
influence the publishing/distribution platforms, in a way that, in some
sense, could create a common definition of dataset/distribution, maybe
the DCAT one, or an extended version.
Best Regards,
Laufer
2014-11-14 11:18 GMT-02:00 Makx Dekkers <mail@makxdekkers.com>:
Ed,
In my mind, there is nothing that would prevent people to use DCAT for
a
collection of unrelated data, and I don't think we want to tell them
they can't. Also, it would depend on someone's perspective on what
constitutes 'related'.
Again, my position is that the definition of dataset in DCAT is good
enough, and that we should not spend time in trying to make it better.
(http://www.brainyquote.com/quotes/quotes/v/voltaire109643.html)
Makx.
> -----Original Message-----
> From: Ed Staub [mailto:ed.staub@semanterra.org] On Behalf Of Ed
Staub
> Sent: Thursday, November 13, 2014 5:11 AM
> To: public-dwbp-wg@w3.org
> Subject: Re: ISSUE-80: We need a definition of "dataset"
>
> Note that the RDF Data Cube vocabulary has a different definition of
> "dataset" than DCAT:
>
> "Represents a collection of observations, possibly organized into
> various
> slices, conforming to some common dimensional structure."
>
> Assuming the DCAT definition is used, I think it useful to make
clear
> that a
> "common dimensional structure" is not implied. FWIW, my prior
> experience
> led me to assume the "common dimensional structure" meaning for DCAT
> until I
> dug into the DCAT spec.
>
>
> On the "too-broad" side, there probably are collections of data
> published or
> curated by a single agent that are larger than is intended by this
> definition. In particular, I agree with Bernadette Lóscio in
thinking
> that
> the collection's content should be related - not "a random
assortment
> of
> data". As an extreme example, imagine the entire content of
> datahub.io
> described as a single dataset!
>
>
> So... I'd suggest adding the word "related":
>
> "A related collection of data, published or curated by a single
agent,
> ^^^^^^^
> and available for access or download in one or more formats."
>
> The addition of "related" deals with both concerns at once; it would
> be
> strange and tautological to require all the data in a single cube to
> be
> "related".
>
>
> -Ed Staub
>
>
--
. . . .. . .
. . . ..
. .. .
--
. . . .. . .
. . . ..
. .. .
Attachments
- image/gif attachment: graycol.gif
- image/gif attachment: ecblank.gif
Received on Monday, 17 November 2014 19:24:29 UTC