RE: W3C Data Catalog Vocabulary (DCAT) - Comments from Simon.Cox@csiro.au on 2019-02-03 (public-dxwg-comments@w3.org from February 2019)

From: <Simon.Cox@csiro.au>
Date: Sun, 3 Feb 2019 06:51:53 +0000
To: <daniel.pop@e-uvt.ro>, <public-dxwg-comments@w3.org>
Message-ID: <20ee91415dea4f67abc1bb6d9dc40b51@exch1-mel.nexus.csiro.au>
Hello Daniel – thanks for the comments.

On item 1.

I certainly understand your use-case. The periodicity in specific representations of a dataset might be different from the underlying periodicity in the native dataset. The definition of ‘accrualPeriodicity’ at DCMI [1] is “The frequency with which items are added to a collection.” This reflects the common English usage of the term ‘accrual’ and thus could refer to the periodicity of items in the underlying dataset.

I suspect the real problem is

(i)                  In the DCAT document, the definition of ‘accrualPeriodicity’ is “The frequency at which dataset is published.” [2] which is a rather different idea. For example timeseries data might be at 15s spacing, but only published daily

(ii)                No way is provided to describe the statistics of a dataset distribution (i.e. representation), including periodicity, where it varies from those of the underlying dataset.

My hunch is that

(i)                  the definition of ‘accrualPeriodicity’ - as used in DCAT - should be clarified

(ii)                the addition of complementary property(s) on data-distributions should be considered

I’ll add issues to our stack for each of these.

Note that the scope of ‘distributions’ of datasets is currently under discussion in the DCAT revision team. In particular, there has been a general understanding that all distributions should be ‘informationally equivalent’. However, if different distributions had different sampling rates (periodicity), informationally-equivalent would have to be understood in a particularly nuanced way.

On item 2.

The matter of inverse properties is managed differently in different communities and applications.
In many contexts it is recommended to only encode one direction, to avoid potential conflicts.

On derivation/provenance chains I generally find it helpful to think about the ‘direction of knowledge’.
While a resource will usually know about its predecessors, it is unlikely to know about all (or even any) of its successors.
So ‘wasDerivedFrom’ is the more reliable relationship, and this is why prevenance traces generally record this rather than the inverse.

On item 3.

Following some recent discussion about the DataService taxonomy, it was decided to drop the DiscoveryService class.
The function is mentioned in a usage note for the Data Distribution Service class in the current Editor’s Draft [3].

Regards

Simon Cox


[1] http://dublincore.org/documents/dcmi-terms/#terms-accrualPeriodicity

[2] https://w3c.github.io/dxwg/dcat/#Property:dataset_frequency
[3] https://w3c.github.io/dxwg/dcat/#Class:Data_Distribution_Service


From: Daniel Pop [mailto:daniel.pop@e-uvt.ro]
Sent: Monday, 28 January, 2019 22:38
To: public-dxwg-comments@w3.org
Subject: W3C Data Catalog Vocabulary (DCAT) - Comments

In response to request for comments re: [1], please find below my comments

1) accuralPeriodicity property is set at Dataset level; can't we have different distributions with different periodicity? For example a REST API service that provides real-time data stream, and CSV distributions issued daily/weekly/monthly? Thus, accuralPeriodicity being a property rather per distribution than dataset? Or is it better to have multiple datasets for this purpose? I am thinking at exchange rates as an example, but not limited.

2) wasDerivedFrom allows to move backwards; what about moving forward? I mean a property such as isDerivationBaseFor, so one can retrieve derived/following datasets as well.

3) DiscoveryService - what is the distinctive feature of a DiscoveryService compared to a DataDistributionService? From the diagram depicted in Figure 1 one can tell that it must serve at least one dataSet (servesDataset property is 1.*). But the example in section 5.9 does not show such a property. Is this just a small discrepancy that should be addressed in the example? Should DiscoveryService be DataDistribution service? Should multiplicity of 'serves' relationship between DataService and Dataset be optional (ie 0..*)?

Best regards,
Daniel POP
West University of Timisoara, Romania

[1] https://www.w3.org/TR/vocab-dcat-2/
Received on Sunday, 3 February 2019 06:52:24 UTC