Re: [dxwg] DCAT: Proposal for an updated definition for the concept “dataset” (#1195) from Claus Stadler via GitHub on 2020-01-25 (public-dxwg-wg@w3.org from January 2020)

From: Claus Stadler via GitHub <sysbot+gh@w3.org>
Date: Sat, 25 Jan 2020 17:06:11 +0000
To: public-dxwg-wg@w3.org
Message-ID: <issue_comment.created-578423462-1579971969-sysbot+gh@w3.org>

> what about if there is no model, e.g. for unstructured data or raw data?
Then its still text/plain or application/octet-stream - no? So if an instance of a text or binary document is equal to the one described by a distribution of a dcat:Dataset, then it is highly likely -- but not mandatory -- that we are talking the same dataset. A distribution is technical - in the simple case it points to a document with a concrete syntax.

But if you consider an RDFa file, although it is XML, it can be interpreted in many ways: is it text? XML? (X)HTML? RDF? The meaning has to be specified on the dataset level: If the dataset is about text, then distributions with content types `application/pdf`, `text/plain`, `application/msword` and `application/xhtml+xml` are a reasonable choice.
If the dataset is about triples, then mixing distributions of `application/xhtml+xml` with `text/turtle` is certainly valid - as the former should then be assumed to contain RDFa annotations - however `application/pdf` would not make sense (unless there was a standard to encode triples in pdf).

-- 
GitHub Notification of comment by Aklakan
Please view or discuss this issue at https://github.com/w3c/dxwg/issues/1195#issuecomment-578423462 using your GitHub account

Received on Saturday, 25 January 2020 17:06:15 UTC