[dxwg] dcterms:hasPart in the context of nested Catalogs (#1454)

andreasgeissner has just created a new issue for https://github.com/w3c/dxwg:

== dcterms:hasPart in the context of nested Catalogs ==

Dear DCAT Team,

When trying to model our institutional research data repository, I have come across something that, from the eye of a semantic web amateur, looks inconsistencies with the definition of dcterms:hasPart in the DCAT 3 11 January 2022 public draft. 

We have a tree-shaped system of nested categories in our DSpace repository, called communities, subcommunities, and collections that I want to model in DCAT. Any DSpace item (so bitstreams and metadata) is assigned to exactly one collection, which in turn belongs to exactly one subcommunity and so on. So this is unlike many other repositories where categories are more like additional subjects that can be mixed and matched. 

Let’s say we have a category A that has two subcategories B and C. B and C contain (the metadata of) DSpace items B1,…,Bn and C1,…,Cn, respectively. A would then contain the metadata of both B1,…,Bn and C1,…,Cn. I would consider all of them to be (at least) dcat:Datasets, as they have metadata assigned to it and contain a clearly defined amount of data.

From a pure dataset perspective, the following should be possible to explicitly state which data is in which dataset and that A is a dataset that contains the data of B and C (not regarding the domain of dcat:dataset here)

```
B dcat:dataset B1,…,Bn .
C dcat:dataset C1,…Cn .
A dcterms:hasPart B, C .
```

This is because dcterms:hasPart can be used to split datasets into multiple subdatasets, according to how I understand “multi-part datasets” in Issue #1205. The higher level dataset should contain at least all the information/data the subdatasets do. The current version of example C.1(loosely structured catalog) uses this functionality.

However, §5.1 introduces a dcat:Catalog as “a dataset in which each individual item is a metadata record describing some resource”, meaning A, B, and C would be dcat:Catalogs as well. Of course, also because the domain of dcat:dataset is dcat:Catalog, but even if I could get around of using this, from a definition standpoint it seems to be inevitable for me.

In this context, dcterms:hasPart is defined as “An item that is listed in the catalog.”. Which is similar to the respective definitions of, for example, dcat:dataset or dcat:service. My interpretation is that the metadata of the item is an entry in the catalog, not the item data. Am I right? (https://github.com/w3c/dxwg/commits/gh-pages/dcat/rdf/dcat-external.ttl restricts this definition to DCAT 2.0, but I assume this just has not been updated unless I have overlooked anything) 
```
A dcterms:hasPart B, C .
```
would then mean B and C are listed in A, and not their datasets. It seems a bit inconsistent to me that DCAT 3 was designed to allow for breaking into parts any kind of dataset unless it happens to only contain metadata records, meaning it being a dcat:Catalog. Do I misunderstand this, is there any intention to change this or is it intentionally designed to not be allowed to split dcat:Catalogs? A dcat:entry or something would make more sense for me for “An Item listed in the catalog”. Of course, backwards compatibility might be a big issue here. 

You could explicitly use (losing information about the relation between A and B, C? As stated above, I’m not a semantic web expert, I don’t know what inference would do in this situation)

```
A dcat:dataset B1,…,Bn,C1,…Cn .
B dcat:dataset B1,…,Bn .
C dcat:dataset C1,…Cn .
```


I also don’t like this version in light of the statement for dcat:Catalog, that “A Web-based data catalog is typically represented as a single instance of this class”. Using dcterms:hasPart, you would still have one umbrella catalog with a clear structure so that you can look at parts of it. Here, you would just have a heap of catalogs that are not explicitly related.

Furthermore, not being able to model the subcategories as their own Datasets (and Datasets of metadata records are Catalogs) also would preclude linking exported information of the “subcatalogs”, e.g. as XML files, as “Catalog Distributions”.

Workarounds ways to get information on the category structure in the rdf data might be DatasetSeries (but they are designed for datasets that can be split in a predictable fashion) or having one Catalog and the categories as a themeTaxonomy. It would see a waste to lose it in RDF.

Thanks for your great work on the vocabulary!

Cheers,
Andreas


Please view or discuss this issue at https://github.com/w3c/dxwg/issues/1454 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Friday, 18 February 2022 08:06:42 UTC