Re: [dxwg] DCAT: Proposal for an updated definition for the concept “dataset” (#1195)

@rob-metalinkage Yes, we have an agreement that owl:sameAs does not work. However, I am not sure if your statement 'datasets are records about actual data, from the point of view of a catalog' is really correct. DCAT distinguishes the concepts of dcat:Dataset and dcat:CatalogRecord - and this distinction makes sense.

So as I see it, a dcat:Dataset actually more relates to the concept of '(a unit of) content that was published by a (single) authority'. The nature of the content may be as abstract as 'the sequence of images that makes up the Lord of the Rings movie'. There is freedom here, but when formal data models are involved, this can be made much more concrete. So if this is what the dataset is about, then different distributions should be descriptions of concrete technical aspects, most prominently structure and access mechanisms of this idea, such as files with varying image quality. The CatalogRecord then has information when a dataset was made available in a catalog.

I considered that owl:sameAs could be applied on the distribution level, but I tend to think that the identity of distributions is tied to technical aspects and structuring of the content. For example, I would not consider distribution of a file via a torrent to be the equivalent to a distribution as a HTTP URL or distribution via a GIT URL.
I'd rather say that the **content** (in the abstract sense - not in the sense of syntactic representation or access mechanism) in such cases was equal. (So maybe a generalization of DCAT was C(ontent)CAT)

As for the examples of 'what is not a dataset', I also tend to disagree - every electronic resource is eventually a sequence of bytes and thus data. That's why HTTP has the content type which tells a client how the bytes are to be interpreted - in the worst case this really is application/octet-stream.


-- 
GitHub Notification of comment by Aklakan
Please view or discuss this issue at https://github.com/w3c/dxwg/issues/1195#issuecomment-577951816 using your GitHub account

Received on Friday, 24 January 2020 01:15:42 UTC