W3C home > Mailing lists > Public > public-dxwg-wg@w3.org > October 2019

[dxwg] Relation among Dataset, Distribution and Data Service (#1126)

From: Jakub Klímek via GitHub <sysbot+gh@w3.org>
Date: Tue, 15 Oct 2019 12:41:27 +0000
To: public-dxwg-wg@w3.org
Message-ID: <issues.opened-507218690-1571143285-sysbot+gh@w3.org>
jakubklimek has just created a new issue for https://github.com/w3c/dxwg:

== Relation among Dataset, Distribution and Data Service ==
I am working on an implementation of DCAT2 (and DCAT-AP 2.0.0) in [LinkedPipes DCAT-AP Forms](https://github.com/linkedpipes/dcat-ap-forms), [LinkedPipes DCAT-AP Viewer](https://github.com/linkedpipes/dcat-ap-viewer) and the [Czech National Open Data catalog](https://data.gov.cz), and currently, I am wondering about the relationship among Dataset, Distribution and Data Service, basically seeking additional insights from the WG.

Let me illustrate with an example of what seems clear.

Let us have a Dataset with one RDF TriG Distribution:
:dataset a dcat:Dataset ;
    dcat:distribution :distribution  .

:distribution a dcat:Distribution ;
    dcat:accessURL <https://data.cssz.cz/dataset/ciselnik-datovych-typu> ;
    dcat:downloadURL <https://data.cssz.cz/dump/ciselnik-datovych-typu.trig> ;
    dcterms:format <http://publications.europa.eu/resource/authority/file-type/RDF_TRIG> ;
    dcat:mediaType <http://www.iana.org/assignments/media-types/application/trig> .

Now, I want to use DCAT2 to express, that there is a SPARQL Endpoint at `http://data.cssz.cz/sparql` serving this dataset. It is clear that I need an instance of `dcat:DataService`:
:dataservice a dcat:DataService ;
    dcat:endpointURL <https://data.cssz.cz/sparql> ;
    dcat:endpointDescription <https://data.cssz.cz/sparql> .

The questions start to appear when I think about how to interconnect those.
At first, I was thinking along the lines of the diagram:
:dataservice dcat:servesDataset :dataset .
:distribution dcat:accessService :dataservice .
However, I got confused here. 

`:distribution` describes a downloadable TriG file. At the moment, there is no way of getting a TriG file out of a SPARQL endpoint. Therefore, 
:distribution dcat:accessService :dataservice
suddenly does not make sense. But how to use `dcat:accessService` from a distribution then?

Should it be that the Dataset actually has 2 distributions like this:
:distribution1 a dcat:Distribution ;
    dcat:accessURL <https://data.cssz.cz/dataset/ciselnik-datovych-typu> ;
    dcat:downloadURL <https://data.cssz.cz/dump/ciselnik-datovych-typu.trig> ;
    dcterms:format <http://publications.europa.eu/resource/authority/file-type/RDF_TRIG> ;
    dcat:mediaType <http://www.iana.org/assignments/media-types/application/trig> .
:distribution2 a dcat:Distribution ;
   dcat:accessURL <https://data.cssz.cz/sparql> ;
   dcat:accessService :dataservice .
e.g. a distribution pointing not to a file, but to a service?
If so, what about properties, which are defined both on the level of Distribution and DataService? We can specify them at both places, and the resulting meaning is not clear, e.g. conflicting licenses or accessRights on the Distribution and DataService.

Then again, in my opinion, a DataService is really just a means of accessing representations of Datasets, therefore, I see it more on the same level with a Distribution rather than on the level of Datasets. However, the fact that both Datasets and DataServices inherit from dcat:Resource, but dcat:Distributions do not, would suggest otherwise.
Does the WG envision here that e.g. open data portals start cataloguing Data Services similarly to Datasets, e.g. as first-class citizens of Catalogs, instead of using them at the level of Distributions, i.e. entities dependent on Datasets?

Finally, it seems a bit confusing that Data Service serves datasets, but it is Distributions of datasets, which are accessed using a Data Service. This may, however, be connected to the point above.

The only guidance I found in the document is at the end of 6.7:

> Links between a dcat:Distribution and services or Web addresses where it can be accessed are expressed using dcat:accessURL, dcat:accessService, dcat:downloadURL, as shown in Figure 1 and described in the definitions below.

which, btw, seems like a weird sentence.

Any thoughts on this? Am I missing something or overthinking this?

Please view or discuss this issue at https://github.com/w3c/dxwg/issues/1126 using your GitHub account
Received on Tuesday, 15 October 2019 12:41:31 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:42:21 UTC