Re: [dxwg] Best practice for a loosely-structured catalog

As this discussion moved from the mailing list to this issue, for completeness I'm adding the other messages from the mailing list in this thread.

@makxdekkers [said]( 



>This is indeed an issue that came up in the development of DCAT-AP. In
particular, CKAN is quite liberal in what it accepts as "Resource" related
to a Dataset. The discussion was whether you could map CKAN Resource to DCAT
Distribution, and it was clear that such mapping would have unwanted
effects. This is also related to my earlier question on how "similar"
distributions need to be, which led to a statement that they need to be
"informationally equivalent" (


>I support your proposed solution to use  dct:relation as a catch-all and to
allow for further specialisation whenever necessary and possible. 



and @andrea-perego [said]( 

>Makx, Simon,

>In the extension of DCAT-AP we use in the JRC Data Catalogue, besides distributions we typically have (a) related publications and (b) "other resources" (a catch-all category including all what is not a distribution or a publication). As I said elsewhere [1,2], related publications are specified via dct:isReferencedBy, whereas "other resources" with dct:relation (used as a generic relationship to link a dataset with any kind of related resources). So, this use case may support the idea of making dcat:distribution a subproperty of dct:relation.

>BTW, this pattern is reflected in our CKAN extension – see, e.g.:


>About the fact that the majority of data catalogues use a simple metadata pattern, this is also my experience. Hierarchical "is part of" relationships are far from being common. There may be a number of reasons. For instance, if metadata are manually created (as it is still usually the case) this would require a high maintenance effort. Also in the geospatial domain, where there's explicitly this notion ("dataset series"), what is documented is just the "root" dataset, and the children are not even linked to. Another issue may be related to limitations of catalogue platforms – which are typically not supporting this feature – or to the usability issues resulting from giving users the burden to choose among a long list of datasets which are almost identical but for some variables (e.g., spatial and/or temporal coverage).

>It is also worth noting that the approach used for specifying hierarchical relationships depends very much on the domain and on specific characteristics of a dataset. We have to deal quite often with this issue in the JRC Data Catalogue, and the approaches used are very different – e.g.: 1 dataset with a distribution for each of its children; 1 dataset for each child dataset, and no record for the parent.

>So, probably, we should take into account this situation when providing recommendations on how to model hierarchical/subsetting relationships, and propose alternative options, depending on the specific use case.




GitHub Notification of comment by agbeltran
Please view or discuss this issue at using your GitHub account

Received on Tuesday, 12 June 2018 06:04:17 UTC