W3C home > Mailing lists > Public > public-dxwg-wg@w3.org > October 2019

Re: [dxwg] Relation among Dataset, Distribution and Data Service (#1126)

From: Riccardo Albertoni via GitHub <sysbot+gh@w3.org>
Date: Fri, 18 Oct 2019 17:34:16 +0000
To: public-dxwg-wg@w3.org
Message-ID: <issue_comment.created-543852383-1571420055-sysbot+gh@w3.org>
Hi @jakubklimek,  thanks for your comments, please see my replies below. 

> 
> Should it be that the Dataset actually has 2 distributions like this:
> 
> ```turtle
> :distribution1 a dcat:Distribution ;
>     dcat:accessURL <https://data.cssz.cz/dataset/ciselnik-datovych-typu> ;
>     dcat:downloadURL <https://data.cssz.cz/dump/ciselnik-datovych-typu.trig> ;
>     dcterms:format <http://publications.europa.eu/resource/authority/file-type/RDF_TRIG> ;
>     dcat:mediaType <http://www.iana.org/assignments/media-types/application/trig> .
> :distribution2 a dcat:Distribution ;
>    dcat:accessURL <https://data.cssz.cz/sparql> ;
>    dcat:accessService :dataservice .
> ```
> 
> e.g. a distribution pointing not to a file, but to a service?

Yes,  I would connect SPARQL endpoint to a different distribution than the trig file.  Something of similar is illustrated in  [EXAMPLE 45](https://www.w3.org/TR/vocab-dcat-2/#ex-elaborated-bag). 

> If so, what about properties, which are defined both on the level of Distribution and DataService? We can specify them at both places, and the resulting meaning is not clear, e.g. conflicting licenses or accessRights on the Distribution and DataService.

Distribution and DataService are two distinct things, and there might be cases where you need to specify a licence for both. For example, the same SPARQL endpoint might serve distributions related to different datasets.  I think this is the quite classical case for SPARQL endpoints. And that is one reason why they have been modelled separately. 

In DCAT2, we have been quite liberal about what must be used in what circumstance, in particular, I think that inherited properties can be used only if needed.  There are different rules of thumb, that might be adopted, e.g.,  "when the service is connected to a distribution put the license at the distribution level only".  This seems an easy way to avoid inconsistencies but different catalogues/communities might want to use other rules, e.g.,  " put a license on both distributions and related services but ensure that they are compatible".

I think the modelling here has been done having in mind that there are a quite extended set of cases. We tried to be as general as possible,  as DCAT 2 is expected to be specialized by a wide range of communities. This comes at the cost that the communities might need to decide how to profile and which guidance best fit for them in order to maintain consistency.  


> Then again, in my opinion, a DataService is really just a means of accessing representations of Datasets, therefore, I see it more on the same level with a Distribution rather than on the level of Datasets. However, the fact that both Datasets and DataServices inherit from dcat:Resource, but dcat:Distributions do not, would suggest otherwise.
> Does the WG envision here that e.g. open data portals start cataloguing Data Services similarly to Datasets, e.g. as first-class citizens of Catalogs, instead of using them at the level of Distributions, i.e. entities dependent on Datasets?

Yes, services are promoted to first-class citizens,  though the focus is primarily on service to provide access, DataServices include data processing functions. And there are some examples of DataServices that are not connected to Distributions in the DCAT document ([EXAMPLE 48](https://www.w3.org/TR/vocab-dcat-2/#ex-service-eea) shows a discovery service for a catalogue). In those cases,  we might want to specify licenses for the services Independently from the fact that they are directly connected to Distributions or Datasets.

> 
> Finally, it seems a bit confusing that Data Service serves datasets, but it is Distributions of datasets, which are accessed using a Data Service. This may, however, be connected to the point above.

Not sure to understand here. Anyway, It might help to note  [Example 49 ](https://www.w3.org/TR/vocab-dcat-2/#ex-service-gsa), which shows some DataServices not connected to Distributions of the Dataset they serve.

> 
> The only guidance I found in the document is at the end of 6.7:
> 
> > Links between a dcat:Distribution and services or Web addresses where it can be accessed are expressed using dcat:accessURL, dcat:accessService, dcat:downloadURL, as shown in Figure 1 and described in the definitions below.
> 
> which, btw, seems like a weird sentence.

It just warns that guidance is provided in the subsections describing dcat:accessURL, dcat:accessService, dcat:downloadURL. Further explanations are provided in the examples I have mentioned before.

> 
> Any thoughts on this? Am I missing something or overthinking this?

Do my comments and the examples I have pointed to help in putting the pieces together?




-- 
GitHub Notification of comment by riccardoAlbertoni
Please view or discuss this issue at https://github.com/w3c/dxwg/issues/1126#issuecomment-543852383 using your GitHub account
Received on Friday, 18 October 2019 17:34:21 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 30 October 2019 00:15:58 UTC