Re: [dxwg] Model Series of Data as Distributions of a single Dataset (#1429)

We have identified a similar problem in Sweden to what @sabinem described. In short, we need to make it easy for people to add more data into an existing dataset.

But we have solved the problem in another way in the Swedish profile. We have allowed the dcat:downloadURL to be repeated. Like this:

```
ex:dataset1 a dcat:Dataset ;
dcat:distribution ex:distribution1, ex:distribution2 .

ex:distribution1 a dcat:Distribution ;
     dcterms:title  "Access via CSV files" ;
     dcat:downloadURL  ex:file1,ex:file2 .

ex:file1 dcterms:title "Budget 2019" .
ex:file2 dcterms:title "Budget 2020" .

ex:distribution2 a dcat:Distribution ;
     dcterms:title "Access via a JSON based API" ;
     dcat:accessURL ex:API
```

This approach has the following merits:
1. The distributions are comparable, they contain the same data.
2. The amount of duplication of metadata is minimal
3. It is relatively easy to explain to data providers what to do
4. Providing multiple distributions (e.g. API and file based access) is obvious
5. More information can be provided on each file if there is a need.
6. There won't be any unneccessary pollution in dataportals of many datasets

It could be argued that repeating the dcat:downloadURL is bad, that it is not intended to be used that way. The specification says "the downloadable file" which indeed seems to indicate there should be a cardinality of one.

However, I think it should be investigated if it can easily be tweaked to be compliant.
For instance, allowing a dcat:downloadPartURL as an alternative to dcat:downloadURL, maybe introducing a class like dcat:File and just suggesting that it is allowed to provide a dcterms:title on it.

I think the approach above should be considered as a more lightweight alternative to the dataset series approach.
It is clear that in some situations people really have data that they want to highlight as independent Datasets and still indicate that they are in a series. Hence, the Dataset series is needed, but from what I have seen in Sweden the lightweight approach is something that would be used (is already) much more often.


-- 
GitHub Notification of comment by matthiaspalmer
Please view or discuss this issue at https://github.com/w3c/dxwg/issues/1429#issuecomment-986375863 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Monday, 6 December 2021 02:12:58 UTC