- From: John Erickson <olyerickson@gmail.com>
- Date: Tue, 26 Mar 2013 00:19:28 +0100
- To: Richard Cyganiak <richard@cyganiak.de>
- Cc: Makx Dekkers <makx@makxdekkers.com>, Public GLD WG <public-gld-wg@w3.org>
I'm with Richard on this one; I think DCAT needs to be viewed in a similar light as Dublin Core, which clarifies much ambiguity but does not solve all problems for all use cases. And DCAT is definitely NOT just about linked data; indeed while the publication of data will increasing follow the linked data model, DCAT is useful for a wide variety of data that does not. No minimum number of TBL stars... Being fresh off the Research Data Alliance kick-off in Göteborg last week, I can tell you that even the scientific data community is all over the map regarding what a "dataset" is. For one introduction to this, see Allen Renear's oft-cited "Definitions of Dataset in the Scientific and Technical Literature" <http://bit.ly/ZQ0SEh>. They don't agree on precisely what they are, but they DO agree they need to be persistently and unambiguously identified, and mostly agree that they should be unambiguously typed. More about THAT later ;) Frankly I think CKAN's working definition of "dataset" is useful here --- an aggregation of arbitrary data resources --- although I would probably prefer that level of abstraction to be an "object" or "data object" (following the Kahn/Wilensky notion of the digital object). On Mon, Mar 25, 2013 at 6:51 PM, Richard Cyganiak <richard@cyganiak.de> wrote: > On 25 Mar 2013, at 16:03, "Makx Dekkers" <makx@makxdekkers.com> wrote: >> DCAT defines Dataset as "A collection of data, published or curated by a single source, and available for access or download in one or more formats". >> >> This definition does not give a clear indication of characteristics that distinguish a Dataset from a more general rdfs:Resource. Would it be possible to at least provide some examples of existing resources that fall within this definition, and (even more importantly) some examples that do not? >> >> In a conversation on the public mailing list (http://lists.w3.org/Archives/Public/public-gld-wg/2012Sep/0062.html), it was mentioned that “Any file stored on disk is a data set”. This implies that any machine-readable information (including PDF files!) can be considered a dcat:Dataset. That doesn’t sound right to me. > > We've had that discussion many times. The best definition of “dataset” I've heard is still: “A set of data.” > > I don't see why a PDF file containing a big table shouldn't be considered a dataset. That's not the most useful form for re-use, of course. > > You seem to be suggesting that datasets must have some minimum number of TimBL stars [1] in order to be described with DCAT. I don't think such a restriction helps anybody. > > Best, > Richard > > [1] http://5stardata.info -- John S. Erickson, Ph.D. Director, Web Science Operations Tetherless World Constellation (RPI) <http://tw.rpi.edu> <olyerickson@gmail.com> Twitter & Skype: olyerickson
Received on Monday, 25 March 2013 23:20:00 UTC