Re: [dxwg] Reflect all 'Usage notes' into DCAT RDF representation (#725) from Jakub Klímek via GitHub on 2019-07-20 (public-dxwg-wg@w3.org from July 2019)

From: Jakub Klímek via GitHub <sysbot+gh@w3.org>
Date: Sat, 20 Jul 2019 06:37:49 +0000
To: public-dxwg-wg@w3.org
Message-ID: <issue_comment.created-513442046-1563604668-sysbot+gh@w3.org>

> adding their own guidance for usage seems logical, as long as there is a property that distinguishes it as belonging to their usage

@kcoyle I agree, it is important to agree that we want to say "this is how this property is used in DCAT", not "this property means something DCAT-specific universally". There is a huge difference. And I don't think SHACL nor ShEx will help us here - they work on top of existing data, i.e. when the damage is already done.

@dr-shorthair And I disagree with your argument in multiple points.

1. `anyone can say anything about anything` holds universally for RDF - anyone can have a statement about anything they want in their data. But this is not OWA.
2. OWA does not say `anyone can say anything about anything`. OWA says `The fact that there is not a particular statement does not entail that the statement is false, it is just unknown`. Translated to the example given above, when DCAT says `dcterms:issued` is `formal issuance of distribution`, other people using `dcterms:issued` may not know about this, but it is still out there. And there is no machine-readable description of `this is how DCAT uses it`. It just reads `this is what dcterms:issued is` for everyone.
3. I view a file just as a container for triples/quads. It has no representation in the RDF data model, no way of saying `what is in this file is just a view of DCAT`. If we view the data model just as triples, then all triples stated everywhere hold. According to OWA, we may just not know about them yet. But they still should be semantically correct, otherwise, when we get to know them, we get an error/conflict. In this way, I think there is no such thing as a context.
4. If we view the RDF data model as quads, then then this should be stated explicitly, and the DCAT file should be `trig`, not `ttl`, so that everyone gets the same graph IRI. In addition, there should be some machine-readable description of what this graph means.
5. However, I view the named graphs just as names for sets of triples, used by SPARQL endpoint administrators to structure their data in the endpoint according to one of [several use cases](http://patterns.dataincubator.org/book/named-graphs.html) and not something to be used in specifications to alter the meaning of triples. Also, the data should be usable using the [union graph](http://patterns.dataincubator.org/book/union-graph.html) pattern, where you disregard the named graphs and merge all statements. There we would get conflicts anyway.
6. The same problem is visible in smaller scope even within the single `dcat.ttl` file, as illustrated above - there is no machine readable way to distinguish, which annotation goes with which usage of the property, creating mess. For example, if my application is a simple RDF browser, which, in addition, displays `skos:definition`s for properties used on instances, I will see `"The date of listing (i.e. formal recording) of the corresponding dataset or service in the catalog."@en` among other `skos:definition`s on an instance of `dcat:Distribution`. The application has no way of knowing the correct context of those annotations.

> You say that you have multiple applications accessing 'the same endpoint' so you are trying to switch context in one place (the application) while not also switching context in the RDF graph that you are accessing. You are attempting to shortcut setting the right context and that is what is creating the unexpected side-effect.

I am saying an application has no way of knowing how to switch contexts, as those are not described in a machine-readable way.

IMHO it still holds that when DCAT says something about the global `dcterms:issued`, it should be something applicable in all contexts where `dcterms:issued` is used, and therefore, should not do it. The other users, when not using DCAT, may not know about that statement. But when they do, it should not introduce mess/conflicts.

Another example to illustrate my point:
If I say `dcterms:title rdfs:label "Dataset title"`, I am saying _The globally used property dcterms:title has a label "Dataset title" and all that use it can also use this title"_, which is incorrect. What I wanted to say is _I want to use "dcterms:title" in DCAT as "Dataset title"_, which is something different. And there are 2 ways of saying that. Either create a subproperty, which is truly a Dataset title, and not a generic Title, or devise some other way of saying that, e.g. having an "annotation" entity, linked do `dcterms:title` and to the usage note coming from DCAT. How about using the [Web Annotation Vocabulary](https://www.w3.org/TR/annotation-vocab/) for that?

I also think this is quite a fundamental disagreement and I would like to get views on this from the group, @makxdekkers ?

--
GitHub Notification of comment by jakubklimek
Please view or discuss this issue at https://github.com/w3c/dxwg/issues/725#issuecomment-513442046 using your GitHub account

Received on Saturday, 20 July 2019 06:37:53 UTC