Re: [dxwg] Dataset series (#868) from Jakub Klímek via GitHub on 2019-09-20 (public-dxwg-wg@w3.org from September 2019)

From: Jakub Klímek via GitHub <sysbot+gh@w3.org>
Date: Fri, 20 Sep 2019 13:45:23 +0000
To: public-dxwg-wg@w3.org
Message-ID: <issue_comment.created-533560007-1568987121-sysbot+gh@w3.org>

@kcoyle This case, i.e. mirrors of the exactly same file, is the only case when it makes sense to me to have multiple downloadURLs, i.e. when all metadata of the dataset and distribution apply to all the linked files.

@matthiaspalmer When there is a difference in the files, such as budget for various years, I lean towards splitting this into datasets and defining their relations properly. Splitting a distribution into multiple files arbitrarily makes it harder to interpret what their relation is. Regarding your notes:

I think your approach actually makes it harder to interpret DCAT catalogs, because you would add another level (Dataset-Distribution-File) where metadata may be present, when what you want to desribe can be done with what we already have (Dataset-Distribution, Dataset relations). Therefore, implementations would have to look for relevant metadata in one more place.

> Note1: I would argue that the solution I outlined is very natural as it well aligned with the basic semantics of RDF. I.e. you make statements about resources referenced in the subject position of a triple. All other constructions requires additional semantics to be specified.
> Note 2: What I am suggesting is much simpler and requires less overhead for maintainers of datacatalogs. Alternatively, if solved by the tool, much less complexity in implementations.

I agree that it is natural from the RDF point of view, but my comments regarding additional complexity apply here. It would be easier only if your approach would be recommended instead of the already specified one. When it is in addition, it is adding complexity and I can foresee implementations that implement only parts of DCAT because of this, making them not interoperable in the end.

> Note 3: If the solution I outlined is not enough, you need to say more things about an individual downloadable file, then nothing prohibits the other option to be used in that case if the way to express relations between datasets can be clarified. However, I fail to see a usecase where you need the full strength of the metadata on a dataset to describe the difference between two downloadable files.

What if you want to actually describe (for machine readability) the nature of the relationship of the files, e.g. that it is in fact a time series, not a split according to geospatial features? This is exactly what is done in DCAT2 on the level of datasets already (partially).







-- 
GitHub Notification of comment by jakubklimek
Please view or discuss this issue at https://github.com/w3c/dxwg/issues/868#issuecomment-533560007 using your GitHub account

Received on Friday, 20 September 2019 13:45:24 UTC