RE: [dxwg] DCAT: Proposal for an updated definition for the concept “dataset” (#1195)

I agree with Claus about the potential or perhaps inevitable abstract nature of a dcat:Dataset - one can start filling in the description even when the dcat:Dataset (or dcat:DataService, for that matter) is a notion or a plan just as in the same way we crete a relational database schema and even create the tables without storing a single record.

-----Original Message-----
From: Claus Stadler via GitHub <sysbot+gh@w3.org> 
Sent: 24 January 2020 01:16
To: public-dxwg-wg@w3.org
Subject: Re: [dxwg] DCAT: Proposal for an updated definition for the concept “dataset” (#1195)

@rob-metalinkage Yes, we have an agreement that owl:sameAs does not work. However, I am not sure if your statement 'datasets are records about actual data, from the point of view of a catalog' is really correct. DCAT distinguishes the concepts of dcat:Dataset and dcat:CatalogRecord - and this distinction makes sense.

So as I see it, a dcat:Dataset actually more relates to the concept of '(a unit of) content that was published by a (single) authority'. The nature of the content may be as abstract as 'the sequence of images that makes up the Lord of the Rings movie'. There is freedom here, but when formal data models are involved, this can be made much more concrete. So if this is what the dataset is about, then different distributions should be descriptions of concrete technical aspects, most prominently structure and access mechanisms of this idea, such as files with varying image quality. The CatalogRecord then has information when a dataset was made available in a catalog.

I considered that owl:sameAs could be applied on the distribution level, but I tend to think that the identity of distributions is tied to technical aspects and structuring of the content. For example, I would not consider distribution of a file via a torrent to be the equivalent to a distribution as a HTTP URL or distribution via a GIT URL.
I'd rather say that the **content** (in the abstract sense - not in the sense of syntactic representation or access mechanism) in such cases was equal. (So maybe a generalization of DCAT was C(ontent)CAT)

As for the examples of 'what is not a dataset', I also tend to disagree - every electronic resource is eventually a sequence of bytes and thus data. That's why HTTP has the content type which tells a client how the bytes are to be interpreted - in the worst case this really is application/octet-stream.


--
GitHub Notification of comment by Aklakan Please view or discuss this issue at https://github.com/w3c/dxwg/issues/1195#issuecomment-577951816 using your GitHub account

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com ______________________________________________________________________

********************************************************************** 
This e-mail (and any files or other attachments transmitted with it) is intended solely for the attention of the addressee(s). Unauthorised use, disclosure, storage, copying or distribution of any part of this e-mail is not permitted. If you are not the intended recipient please destroy the email, remove any copies from your system and inform the sender immediately by return.
Communications with the Scottish Government may be monitored or recorded in order to secure the effective operation of the system and for other lawful purposes. The views or opinions contained within this e-mail may not necessarily reflect those of the Scottish Government.
**********************************************************************
 

Received on Friday, 24 January 2020 07:14:39 UTC