Re: [dxwg] Please provide an explanation of why dcat:Catalog is a subclass of dcat:Dataset (#1634) from joachimnielandt via GitHub on 2025-10-03 (public-dxwg-wg@w3.org from October 2025)

From: joachimnielandt via GitHub <noreply@w3.org>
Date: Fri, 03 Oct 2025 12:08:00 +0000
To: public-dxwg-wg@w3.org
Message-ID: <issue_comment.created-3365441096-1759493277-noreply@w3.org>

We are currently having similar discussions regarding the inheritance of `Catalog`<`Dataset`. In my view, the inheritance introduces a number of problems, such as:
- It's not clear to me what "dataset identity" the Catalog has. Is it a dataset like any other, or is it a _meta_ dataset of meta Resources? Both options seem to have a different meaning and result in different usage of the entities.
- If the inheritance is taken literally, this implies the availability of all Dataset relations when using a Catalog.
- Do they have the same meaning?
- Can you assign a Distribution to a Catalog? What does it distribute? The recursive contents of the Catalog or some aggregate of the metadata Resources contained within?
- If the inheritance is taken literally, all attributes of Dataset are available in Catalog
- Do they have the same meaning? If not, a lot of clarification needs to happen to remove ambiguity. For example: `dct:modified` (available on both entities explicitly?), would this pertain to the collection of the Datasets contained in the Catalog, or does it describe when the children of the `Catalog` were modified last?
- Similar questions arise when considering inheriting attributes on a content level: do you consider the attribute on the entity level itself, or do you consider them to aggregate things from the lower levels, e.g., the Catalog's `dct:spatial` is an indication of what the `Catalog` is intended to describe in a geographical sense versus an aggregate of all the `dct:spatial` values found in the children.
- Implementing the inheritances (`Catalog`<`Dataset`, `Datasetseries`<`Dataset`) as stated in a frontend application implies you need to introduce a lot of complexity. Theoretically, you could have a mishmash of nested `Catalogs`, `Series` and `Datasets` in all sorts of configurations. Clearly this would not be legible for users of a catalogue, or easily interpretable for consumers of an API.

In my view, the conceptual model would be a lot cleaner and interpretable by removing the above inheritances and explicitly stating what attributes are available in which entity, making their meaning unambiguous. Curiously, it seems DCAT-AP is not putting the inheritances front and center, based on the UML (https://semiceu.github.io/DCAT-AP/releases/3.0.0/html/overview-annotated.jpg). But the text indicates it's still adhered to, following the vocabulary.

--
GitHub Notification of comment by joachimnielandt
Please view or discuss this issue at https://github.com/w3c/dxwg/issues/1634#issuecomment-3365441096 using your GitHub account

--
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Friday, 3 October 2025 12:08:01 UTC