Re: Blog post about "Provenance in RDF-star" from Pierre-Antoine Champin on 2022-02-07 (public-rdf-star@w3.org from February 2022)

From: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>
Date: Mon, 7 Feb 2022 14:07:27 +0100
To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Cc: "public-rdf-star@w3.org" <public-rdf-star@w3.org>
Message-ID: <6a3e0163-00d3-25ce-c6be-39425d513d17@ercim.eu>
Hi Antoine,

yes, this blog post is meant to provide guidance on how to use (or not 
use) RDF-star.
no, it is not meant to exhaust the topic of modelling provenance in 
RDF-star. More about this below, but granted, the title of the blog post 
was misleading as to its intent. I changed it to "RDF-star patterns for 
provenance", hopefully making it clearer.

The main point was to illustrate that annotating edges directly, 
although attractive, does not always lead to a correct/satisfactory 
modeling. Provenance is mostly used here as a setting for illustrating 
this point. Actually, the initial plan to illustrate these patterns with 
*different* use-cases, but the post ended up big enough once the 
provenance examples were described. So we decided to scale it down to 
this particular use-case, and describe others in future posts.

Also for the sake of brevity and focus, we decided to leave aside other 
(nonetheless interesting) questions, such as

- whether the properties in the examples are "transparency-enabling" or 
not (the distinction btw "Lex said 'Superman can fly'" vs. "Lex said 
that Superman can fly" -- see 
https://www.w3.org/2021/12/rdf-star.html#selective-ref-transparency)

- what existing vocabulary can be used instead of the toy vocabulary 
used in the examples (and yes, your proposal based on PROV is definietly 
a path worth exploring)

This also will hopefully be described in future posts.

   best


On 04/02/2022 11:22, Antoine Zimmermann wrote:
> Pierre-Antoine,
>
>
> Le 27/01/2022 à 17:43, Pierre-Antoine Champin a écrit :
>> Hi Antoine,
>>
>> jum to the very end of your message for my reply.
>
> Go down to have my answer to your reply.
>
>>
>> On 27/01/2022 10:30, Antoine Zimmermann wrote:
>>> Pierre-Antoine,
>>>
>>>
>>> I think the description of the intended meaning of the RDF-star 
>>> graphs given in this post are not aligned with the formal meaning 
>>> given in the spec. Or, at least, that the presentation is misleading 
>>> the reader into misusing quoted triples for provenance (or for 
>>> anything, for that matter).
>>>
>>> Bare with me for a moment, as I have to place my arguments one at a 
>>> time before concluding.
>>>
>>> You give this example:
>>>
>>> """
>>> PREFIX : <http://www.example.org/>
>>>
>>> :employee38 :familyName "Smith" .
>>> << :employee38 :jobTitle "Assistant Designer" >> :accordingTo 
>>> :employee22 .
>>> """
>>>
>>> and say: "The intended meaning of this small RDF-star graph is: 
>>> “employee #38 is named Smith, and employee #22 claims that employee 
>>> #38 is an assistant designer”."
>>>
>>> The problem here is that a reader may conclude that, if they want to 
>>> say “employee #38 is named Smith, and employee #22 claims that 
>>> employee #38 is an assistant designer”, among other things, they can 
>>> just take your example and integrate it in their data set. This may 
>>> not be sensible, depending on what they want to say about the claim, 
>>> and most importantly, what they *don't* want to say about it.
>>>
>>> The issue is that, by saying "The intended meaning of this RDF-star 
>>> graph is [explanation]", you actually want to say "As part of the 
>>> intended meaning of this RDF-star graph, we have that 
>>> [explanation]". But this is not the full meaning of the RDF-star 
>>> graph. Indeed, due to the RDF-star semantics, there is additional 
>>> meaning imposed by the spec itself.
>>>
>>> The spec says that this RDF-star graph also carries the meaning that 
>>> the claim is related to the URIs ":employee38" and ":jobTitle" in a 
>>> specific way, and related to the string literal """"Assistant 
>>> Designer"^xsd:string""". If one merely wants to say that "employee 
>>> 22 claims that employee 38 is an assistant designer", one perhaps 
>>> *does not* want to relate this claim to the URI ":jobTitle".
>>>
>>> When you define the intended meaning, you can say whatever you like 
>>> about what the URIs denote, as long as they are not among the 
>>> standard URIs of the spec. So you can say, for instance, that 
>>> ":accordingTo" denotes the relation that exists between a claim and 
>>> the people who make the claim. But you cannot define the intended 
>>> meaning of a structure of the language, like quoted triples, which 
>>> is defined by the spec.
>>>
>>> As an analogous example, consider standard RDF and the following 
>>> RDF-graph:
>>>
>>> """
>>> :claim1 :accordingTo "Pierre-Antoine".
>>> """
>>>
>>> You can say that ":accordingTo" is intended to mean the relation 
>>> between a claim and a person, but you cannot say that the intended 
>>> meaning of this triple is that ":claim1" is claimed by a person 
>>> named "Pierre-Antoine". Given the intention that ":accordingTo" 
>>> relates a claim to a person, this graph is implying that the 
>>> character string "Pierre-Antoine" is a person, which is absurd.[*]
>>>
>>> With such examples and explanations in your post, you are suggesting 
>>> the audience that they can use your RDF-star examples as templates 
>>> for the intended meanings you present. So you are telling the 
>>> audience that they can use RDF-star graphs in ways that clash with 
>>> the formal semantics. In other words, you are openly showing that 
>>> the RDF-star semantics can be safely ignored.
>>>
>>> As a consequence, I do not see how there could be, and why there 
>>> should be, any support for the current formal semantics of the spec. 
>>> Either throw it to the bin (allowing anyone to form their own 
>>> interpretations of what quoted triples entail) or revise it such 
>>> that it matches the intended meanings suggested by its authors.
>>>
>>>
>>>
>>> [*] of course, one could interpret ":accordingTo" as: "the relation 
>>> between a claim and the first name of a person that makes the claim". 
>>
>> Yes, that's exactly what I was about to argue. I would even go 
>> further, and argue that many (all?) properties can be seen, from some 
>> perspective, as the  kind of "shortcut" that you describe above. 
>> Consider foaf:givenName:
>>
>>      :az foaf:givenName "Antoine".
>>
>> While it is convenient to conflate your given name the sequence of 
>> characters used to write it, this design prevents me from expressing 
>> some things, like for example the fact that the given name `Antoine` 
>> is derived from the latin name `Antonius`.
>
> The difference here is that most people, I believe, would accept that 
> a name can be a character string (and vice versa). If I consider the 
> character string 's', 't', 'a', 'r', I'm happy to say that it is a 
> word in English. Likewise, I'm happy to say that 'A', 'n', 't', 'o', 
> 'i', 'n', 'e', is a name of latin origin. We identify names and 
> character strings all the time, and it is fine. If you are working in 
> the field of lexicography and philology, you may want to identify 
> words, word representations, word senses, etc. with individual URIs, 
> but I'd say it is beside the point.
>
>>
>> The same goes for properties that apply to quoted triples, in my 
>> opinion.
>>
>>> Similarly, one could interpret ":accordingTo" as "the relation 
>>> between a claim that's attached to certain terms in subject, 
>>> predicate, and object positions, and a person who makes a claim with 
>>> these terms".
>>> But presenting the blog post in this way would ruin the 
>>> attractiveness of RDF-star very much.
>>
>> Could you develop why?
>
> There are two things I'd like to develop: the first one is the way the 
> meaning of the quoted triples is presented in the blog post; the 
> second is the way provenance is supposedly modelled in this blog post.
>
> Concerning the meaning of RDF-star triples like:
>
> << :emp38 :jobTitle "Assistant Designer" >> :accordingTo :emp22 .
>
> we can make an analogy. Suppose we have the following sentence:
>
> """
> << Clark Kent is the same person as Superman >> said Lex.
> """
>
> Describing the meaning of this sentence would go like this:
>
> "This sentence means that Lex used the words in between the quotes, in 
> this order."
>
> It would be misleading to describe it like:
>
> "This sentence means that Lex claims that Clark Kent and Superman are 
> just one person."
>
> Of course, the sentence *implies* such a claim, but it is not the full 
> meaning of it. Someone who's not familiar with quotes may understand 
> that this is equivalent to:
>
> """
> << Superman is the same person as Clark Kent >> said Lex.
> """
>
> because this equally implies that Lex claims that Clark Kent and 
> Superman are just one person.
>
> The distinction may be subtle, but in the case of this blog post, you 
> are not merely explaining what some data out there is about. You are 
> telling people how to use RDF-star for provenance, with your RDF-star 
> spec editor hat on. I regard your post as advocating good (best?) 
> practices.
>
> RDF-star quoted triples are a lot like quotes in sentences, they refer 
> to specific RDF terms, not to mere "claims".
>
>
> The second point is about provenance. Provenance is an important topic 
> in computer science, data management, and even before the existence of 
> computers, provenance was a thing for historical documents and pieces 
> of art. It's important enough to have its own field of study, its 
> models and theories, its tools, its practices.
> There is a standard for provenance specifically made to be used in RDF 
> data. You could easily reuse the PROV model and the PROV-O ontology, 
> which would make your examples not only more recommendable, but in 
> fact literally *recommended* by the W3C.
>
> The way I would write the examples you describe is the following: 
> Employee 22 claims that Employee 38 is an assistant designer. This is 
> the fact we want to model. So let us have the URIs :emp22, :emp38, and 
> :claim1 denote, respectively, Employee 22, Employee 38 and the claim 
> made by Employee 22. This claim can be encoded as an RDF triple in 
> this way:
>
> :emp38 :jobTitle "Assistant Designer" .
>
> where :jobTitle denotes the relation between a person and a human 
> readable name of the job, to be encoded as a character string. Then I 
> can say, using the provenance model:
>
> << :emp38 :jobTitle "Assistant Designer" >> prov:wasDerivedFrom :claim1 .
> :claim1 prov:wasAttributedTo :emp22 .
>
> This not only fits the definitions of the terms prov:wasDerivedFrom, 
> prov:wasAttributedTo, and :jobTitle, it also strictly fits with the 
> formal semantics in  the RDF-star spec, and finally uses a well 
> established model of provenance.
>
> Now, again, you could have a property that is equivalent to the 
> composition of prov:wasDerivedFrom and prov:wasAttributedTo, and call 
> it ":accordingTo". But you would have to describe it as such. In your 
> post, nothing says that :accordingTo is intended to mean a triple 
> derived from a claim attributed to a person. Also, the name 
> :accordingTo is misleading, as it does not suggest that it is about a 
> triple of RDF terms.
>
> If, instead, you had described :accordingTo in a precise way that 
> agrees with the semantics, it would have led to a more complex and 
> confusing explanation. If, as I believe, one of the aims of the blog 
> post is to point the audience of RDF-star to the simplicity of the 
> model, having a complex explanation is detrimental to the 
> attractiveness of RDF-star.
>
> In fact, it is not even very clear to me what ought to be the intended 
> meaning of ":accordingTo" if it was to be compliant with the formal 
> semantics. Is it, as I suggest with the example above, that :emp22 
> made a claim, and the claim is encoded as the triple <<:emp38 
> :jobTitle "Assistant Designer" .>>? Or is it that :emp22 
> used/generated the triple somehow, regardless of whether they actually 
> believe or claim the underlying statement?
>
>
> If I had to express provenance of RDF data using RDF-star, I would use 
> the PROV model as I did above. However, if I merely wanted to say 
> "Employee 22 claims that Employee 38's job title is 'Assistant 
> Designer'", I would rather use something like:
>
> @prefix s: <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject> .
> @prefix p: <http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate> .
> @prefix o: <http://www.w3.org/1999/02/22-rdf-syntax-ns#object> .
> @prefix : <http://example.com/> .
> [s: :emp38; p: :jobTitle; :o "Assistant Designer"] :accordingTo :emp22
>
> which makes the claim unrelated to the URIs used, or any syntax for 
> that matter.
>
>
> --AZ
>
>
> PS: "quoted triples" in RDF-star are very much like triples that are 
> quoted, but not exactly. You cannot properly quote triples with blank 
> nodes. This has strange consequences.
>
>>
>>>
>>>
>>>
>>> Best,
>>> --AZ
>>>
>>>
>>>
>>>
>>>
>>> Le 26/01/2022 à 21:34, Pierre-Antoine Champin a écrit :
>>>> Dear all,
>>>>
>>>> following a discussion during our two last calls, I published a 
>>>> post about "Provenance in RDF-star":
>>>>
>>>> https://www.w3.org/community/rdf-dev/2022/01/26/provenance-in-rdf-star/ 
>>>>
>>>>
>>>> quoting the intro:
>>>>
>>>>  > In this post, we present some lessons learned by the group 
>>>> through discussions and exchanges. This is meant to give some 
>>>> insight about the rationale behind RDF-star, and some guidelines 
>>>> about how to best use it for modeling provenance data.
>>>>
>>>> Many thanks to all the participants of the RDF-star group for their 
>>>> reviews and feedback on this post.
>>>>
>>>>    pa
>>>>
>>>
>>>
>
>
Attachments

application/pgp-keys attachment: OpenPGP public key
Received on Monday, 7 February 2022 13:07:31 UTC