Re: ISSUE-80: We need a definition of "dataset" from Ed Staub on 2014-11-13 (public-dwbp-wg@w3.org from November 2014)

From: Ed Staub <estaub2@comcast.net>
Date: Wed, 12 Nov 2014 23:11:06 -0500
To: <public-dwbp-wg@w3.org>
Message-ID: <002a01cffef7$d943a520$8bcaef60$@net>

Note that the RDF Data Cube vocabulary has a different definition of
"dataset" than DCAT:

"Represents a collection of observations, possibly organized into various
slices, conforming to some common dimensional structure."

Assuming the DCAT definition is used, I think it useful to make clear that a
"common dimensional structure" is not implied.  FWIW, my prior experience
led me to assume the "common dimensional structure" meaning for DCAT until I
dug into the DCAT spec.


On the "too-broad" side, there probably are collections of data published or
curated by a single agent that are larger than is intended by this
definition.  In particular, I agree with Bernadette Lóscio in thinking that
the collection's content should be related - not "a random assortment of
data".  As an extreme example, imagine the entire content of datahub.io
described as a single dataset!


So... I'd suggest adding the word "related":

"A related collection of data, published or curated by a single agent, 
   ^^^^^^^
and available for access or download in one or more formats."

The addition of "related" deals with both concerns at once; it would be
strange and tautological to require all the data in a single cube to be
"related".


-Ed Staub

Received on Friday, 14 November 2014 08:40:06 UTC