RE: reviewing the BP doc from Makx Dekkers on 2015-06-29 (public-dwbp-wg@w3.org from June 2015)

From: Makx Dekkers <mail@makxdekkers.com>
Date: Mon, 29 Jun 2015 12:58:51 +0200
To: "'Data on the Web Best Practices Working Group'" <public-dwbp-wg@w3.org>
Message-ID: <000001d0b25a$98e13080$caa39180$@makxdekkers.com>

 

*  Concerning Makx question, it is also not clear for me if distributions contain the same data point. Initially, I thought that DCAT would allow distributions that are just similar in nature but with different data points. However, a dataset has properties dct:temporal and dct:spatial that maybe will restrict the data. 

 

This was an issue that came up in the discussion: if DCAT had had the intention to allow ‘similar’ data, it would have put dct:temporal and dct:spatial on Distribution rather than on Dataset. The counter-argument was that DCAT does not explicitly prohibit the use of those two properties on Distribution either.

 

*  IMO there is something missing between a dataset and its distributions. In the DWBP document, I used the notion of version, however after the discussions with the group I think version is not the right concept. Maybe something like "dataset instance" is more suitable. If we consider a dataset as an abstract concept (I think it should be), then instances of a dataset may be created according to different spatial and temporal granularities. In the budget example, there will be a dataset, called annual budget, and then there will be one instance of the dataset for each year. When necessary, an instance may have a current version (the instance itself) and one or more previous versions, where a version will represent the state of the instance at a given moment. In this case, an instance will have one or more distributions that should differ just in format or access method/endpoint. 

 

As I said on the call, my opinion is that if there is a need to change the basic model of DCAT – and introducing a new class ‘between dataset and its distributions’ is a fundamental change – this can only be done in a formal DCAT redesign project. One of the consequences to be considered in such a project is that these kinds of fundamental changes will break existing implementations. 

 

Having said that, I do not think the requirement to allow grouping mechanisms – and implementers do ask for that – necessarily leads to the approach you suggest, putting a new class between dataset and distribution. In my view, it could be done through a class that is *above* the Dataset. A possible candidate for such a higher level class could be prov:Collection (http://www.w3.org/TR/2013/REC-prov-o-20130430/#Collection). The advantage would be that it does not involve rearranging the current DCAT classes but only adds an optional mechanism to DCAT that does not affect existing implementations.

 

But there are also people who consider the existing grouping mechanism – the Dataset as a grouping mechanism for Distributions – sufficient for most relationships between data files. Those people may not see the need for a Dataset grouping mechanism. The advantage of that position is that there is no need at all to change the model of DCAT; it just requires some additional properties to be added to Distribution.

 

Makx.

Received on Monday, 29 June 2015 10:59:31 UTC