- From: Bert Van Nuffelen via GitHub <sysbot+gh@w3.org>
- Date: Mon, 10 Feb 2025 21:15:49 +0000
- To: public-dxwg-wg@w3.org
My personal opion is that mixing literals and structured values is generally a bad practice and should be avoided as much as possible. Enabling this not only impacts your local data catalogue but the whole data network of data catalogues through harvesting. Considerations to take into account: 1. Any harvesting data portal will be faced with the situation that _some_ values of dcat:keyword are URIs and thus should be resolved to have a string value. (mixture of values) 2. You have to decide what is the string value attached to the URI. But that creates even a larger problem for the harvesting data portal as it does not know your decision. E.g. wikidata concepts are not all skos:Concept thus you cannot rely on the assumption that skos:prefLabel is the method to find the label. 3. Any Swiss city data portal using current DCAT as reference for its data scheme must adapt the software to match your profile decision. Thus you impose costs. It has to support all cases now. 4. You cannot avoid blank nodes. 5. Every profile has to write extensive usage note to explain their choices. Such usage notes are sources for interprofile conflicts. In the example below I put all possible values that could be technically provided when opening up the range. I have seen them all on the same property. ``` @prefix dcat: <http://www.w3.org/ns/dcat#> . @prefix dct: <http://purl.org/dc/terms/> . <https://swisstopo/opendata/dataset/1234> a dcat:Dataset ; dcat:keyword <https://register.ld.admin.ch/termdat/215878>, # resolve externally using curl <https://register.ld.admin.ch/termdat/215878-humpydumpy>, # a non existing value to be resolved externally using curl <http://www.eionet.europa.eu/gemet/concept/100>, # resolve internal in provided data _:node123, # resolve internally with rdfs:label for string value _:node124, # resolve internally with schema.org:name for string value "hochwasser"@"de", # language string "water", # plain string "https://register.ld.admin.ch/termdat/215878"^^xsd:anyURI. # resolve externally encoded as string value. <http://www.eionet.europa.eu/gemet/concept/100> skos:prefLabel "administrative body"@"en". _:node123 rdfs:label "administrative body". # Is the decision for rdfs:label made by W3C, a local profile or the maintainer of the codelist? _:node124 schema:name "administrative body@"en". # Is the decision for rdfs:label made by W3C, a local profile or the maintainer of the codelist? ``` In your guidelines you make a lot of assumptions but leave the door still open for any of the above cases. Which of the cases you do not want to support? The problem for me is that your selection is arbitrary: your selection will fit your needs, but another data portal would like it differenty. It is already very challeging with only structured values. Adding string values into the game is making it even more complicated. My point of concern is that dcat:keyword is about an "uncontrolled use of range of values". That is best captured by a string/langstring approach. This is the most simple and naive method to tag datasets to increase their findability. We should not give up simple methodes if there already valid approaches in the specification that could support a more structured approach. Observe that any (semi)controlled approach is mappable into this representation. 2 relative simple approaches to meet your requirements: a) When the context is the following: the only way a data gets into your portal is using an editorial form controlled by the portal. You encode in the form the codelists in the selection box and turn it on storage into string values. b) If harvesting is involved from other portals, then create subproperties of dct:subject with the appropriate codelist as range restriction and then turn it on request to DCAT into literal values. Because you mention that the termdat system has a legal basis for use, I am even more in favor for the second case. In that you can straightforward make a distinction between the ID use <https://register.ld.admin.ch/termdat/215878> and the equivalent literal use "hochwasser"@"de". (They are from a skos perspective highly equivalent as SKOS imposes that each prefLabel is uniquely identifying a concept). -- GitHub Notification of comment by bertvannuffelen Please view or discuss this issue at https://github.com/w3c/dxwg/issues/1585#issuecomment-2649257996 using your GitHub account -- Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config
Received on Monday, 10 February 2025 21:15:49 UTC