W3C home > Mailing lists > Public > public-dwbp-wg@w3.org > November 2014

Re: ISSUE-80: We need a definition of "dataset"

From: Ed Staub <estaub2@comcast.net>
Date: Wed, 12 Nov 2014 23:11:06 -0500
To: <public-dwbp-wg@w3.org>
Message-ID: <002a01cffef7$d943a520$8bcaef60$@net>
Note that the RDF Data Cube vocabulary has a different definition of
"dataset" than DCAT:

"Represents a collection of observations, possibly organized into various
slices, conforming to some common dimensional structure."

Assuming the DCAT definition is used, I think it useful to make clear that a
"common dimensional structure" is not implied.  FWIW, my prior experience
led me to assume the "common dimensional structure" meaning for DCAT until I
dug into the DCAT spec.

On the "too-broad" side, there probably are collections of data published or
curated by a single agent that are larger than is intended by this
definition.  In particular, I agree with Bernadette Lóscio in thinking that
the collection's content should be related - not "a random assortment of
data".  As an extreme example, imagine the entire content of datahub.io
described as a single dataset!

So... I'd suggest adding the word "related":

"A related collection of data, published or curated by a single agent, 
and available for access or download in one or more formats."

The addition of "related" deals with both concerns at once; it would be
strange and tautological to require all the data in a single cube to be

-Ed Staub
Received on Friday, 14 November 2014 08:40:06 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:39:28 UTC