Re: Blog post about "Provenance in RDF-star" from Antoine Zimmermann on 2022-02-04 (public-rdf-star@w3.org from February 2022)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Fri, 4 Feb 2022 11:22:43 +0100
To: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>, Pierre-Antoine Champin <pierre-antoine@w3.org>
Cc: "public-rdf-star@w3.org" <public-rdf-star@w3.org>
Message-ID: <130e2554-6659-94fe-73f9-0842d47d365d@emse.fr>
Pierre-Antoine,


Le 27/01/2022 à 17:43, Pierre-Antoine Champin a écrit :
> Hi Antoine,
> 
> jum to the very end of your message for my reply.

Go down to have my answer to your reply.

> 
> On 27/01/2022 10:30, Antoine Zimmermann wrote:
>> Pierre-Antoine,
>>
>>
>> I think the description of the intended meaning of the RDF-star graphs 
>> given in this post are not aligned with the formal meaning given in 
>> the spec. Or, at least, that the presentation is misleading the reader 
>> into misusing quoted triples for provenance (or for anything, for that 
>> matter).
>>
>> Bare with me for a moment, as I have to place my arguments one at a 
>> time before concluding.
>>
>> You give this example:
>>
>> """
>> PREFIX : <http://www.example.org/>
>>
>> :employee38 :familyName "Smith" .
>> << :employee38 :jobTitle "Assistant Designer" >> :accordingTo 
>> :employee22 .
>> """
>>
>> and say: "The intended meaning of this small RDF-star graph is: 
>> “employee #38 is named Smith, and employee #22 claims that employee 
>> #38 is an assistant designer”."
>>
>> The problem here is that a reader may conclude that, if they want to 
>> say “employee #38 is named Smith, and employee #22 claims that 
>> employee #38 is an assistant designer”, among other things, they can 
>> just take your example and integrate it in their data set. This may 
>> not be sensible, depending on what they want to say about the claim, 
>> and most importantly, what they *don't* want to say about it.
>>
>> The issue is that, by saying "The intended meaning of this RDF-star 
>> graph is [explanation]", you actually want to say "As part of the 
>> intended meaning of this RDF-star graph, we have that [explanation]". 
>> But this is not the full meaning of the RDF-star graph. Indeed, due to 
>> the RDF-star semantics, there is additional meaning imposed by the 
>> spec itself.
>>
>> The spec says that this RDF-star graph also carries the meaning that 
>> the claim is related to the URIs ":employee38" and ":jobTitle" in a 
>> specific way, and related to the string literal """"Assistant 
>> Designer"^xsd:string""". If one merely wants to say that "employee 22 
>> claims that employee 38 is an assistant designer", one perhaps *does 
>> not* want to relate this claim to the URI ":jobTitle".
>>
>> When you define the intended meaning, you can say whatever you like 
>> about what the URIs denote, as long as they are not among the standard 
>> URIs of the spec. So you can say, for instance, that ":accordingTo" 
>> denotes the relation that exists between a claim and the people who 
>> make the claim. But you cannot define the intended meaning of a 
>> structure of the language, like quoted triples, which is defined by 
>> the spec.
>>
>> As an analogous example, consider standard RDF and the following 
>> RDF-graph:
>>
>> """
>> :claim1 :accordingTo "Pierre-Antoine".
>> """
>>
>> You can say that ":accordingTo" is intended to mean the relation 
>> between a claim and a person, but you cannot say that the intended 
>> meaning of this triple is that ":claim1" is claimed by a person named 
>> "Pierre-Antoine". Given the intention that ":accordingTo" relates a 
>> claim to a person, this graph is implying that the character string 
>> "Pierre-Antoine" is a person, which is absurd.[*]
>>
>> With such examples and explanations in your post, you are suggesting 
>> the audience that they can use your RDF-star examples as templates for 
>> the intended meanings you present. So you are telling the audience 
>> that they can use RDF-star graphs in ways that clash with the formal 
>> semantics. In other words, you are openly showing that the RDF-star 
>> semantics can be safely ignored.
>>
>> As a consequence, I do not see how there could be, and why there 
>> should be, any support for the current formal semantics of the spec. 
>> Either throw it to the bin (allowing anyone to form their own 
>> interpretations of what quoted triples entail) or revise it such that 
>> it matches the intended meanings suggested by its authors.
>>
>>
>>
>> [*] of course, one could interpret ":accordingTo" as: "the relation 
>> between a claim and the first name of a person that makes the claim". 
> 
> Yes, that's exactly what I was about to argue. I would even go further, 
> and argue that many (all?) properties can be seen, from some 
> perspective, as the  kind of "shortcut" that you describe above. 
> Consider foaf:givenName:
> 
>      :az foaf:givenName "Antoine".
> 
> While it is convenient to conflate your given name the sequence of 
> characters used to write it, this design prevents me from expressing 
> some things, like for example the fact that the given name `Antoine` is 
> derived from the latin name `Antonius`.

The difference here is that most people, I believe, would accept that a 
name can be a character string (and vice versa). If I consider the 
character string 's', 't', 'a', 'r', I'm happy to say that it is a word 
in English. Likewise, I'm happy to say that 'A', 'n', 't', 'o', 'i', 
'n', 'e', is a name of latin origin. We identify names and character 
strings all the time, and it is fine. If you are working in the field of 
lexicography and philology, you may want to identify words, word 
representations, word senses, etc. with individual URIs, but I'd say it 
is beside the point.

> 
> The same goes for properties that apply to quoted triples, in my opinion.
> 
>> Similarly, one could interpret ":accordingTo" as "the relation between 
>> a claim that's attached to certain terms in subject, predicate, and 
>> object positions, and a person who makes a claim with these terms".
>> But presenting the blog post in this way would ruin the attractiveness 
>> of RDF-star very much.
> 
> Could you develop why?

There are two things I'd like to develop: the first one is the way the 
meaning of the quoted triples is presented in the blog post; the second 
is the way provenance is supposedly modelled in this blog post.

Concerning the meaning of RDF-star triples like:

<< :emp38 :jobTitle "Assistant Designer" >> :accordingTo :emp22 .

we can make an analogy. Suppose we have the following sentence:

"""
<< Clark Kent is the same person as Superman >> said Lex.
"""

Describing the meaning of this sentence would go like this:

"This sentence means that Lex used the words in between the quotes, in 
this order."

It would be misleading to describe it like:

"This sentence means that Lex claims that Clark Kent and Superman are 
just one person."

Of course, the sentence *implies* such a claim, but it is not the full 
meaning of it. Someone who's not familiar with quotes may understand 
that this is equivalent to:

"""
<< Superman is the same person as Clark Kent >> said Lex.
"""

because this equally implies that Lex claims that Clark Kent and 
Superman are just one person.

The distinction may be subtle, but in the case of this blog post, you 
are not merely explaining what some data out there is about. You are 
telling people how to use RDF-star for provenance, with your RDF-star 
spec editor hat on. I regard your post as advocating good (best?) practices.

RDF-star quoted triples are a lot like quotes in sentences, they refer 
to specific RDF terms, not to mere "claims".


The second point is about provenance. Provenance is an important topic 
in computer science, data management, and even before the existence of 
computers, provenance was a thing for historical documents and pieces of 
art. It's important enough to have its own field of study, its models 
and theories, its tools, its practices.
There is a standard for provenance specifically made to be used in RDF 
data. You could easily reuse the PROV model and the PROV-O ontology, 
which would make your examples not only more recommendable, but in fact 
literally *recommended* by the W3C.

The way I would write the examples you describe is the following: 
Employee 22 claims that Employee 38 is an assistant designer. This is 
the fact we want to model. So let us have the URIs :emp22, :emp38, and 
:claim1 denote, respectively, Employee 22, Employee 38 and the claim 
made by Employee 22. This claim can be encoded as an RDF triple in this way:

:emp38 :jobTitle "Assistant Designer" .

where :jobTitle denotes the relation between a person and a human 
readable name of the job, to be encoded as a character string. Then I 
can say, using the provenance model:

<< :emp38 :jobTitle "Assistant Designer" >> prov:wasDerivedFrom :claim1 .
:claim1 prov:wasAttributedTo :emp22 .

This not only fits the definitions of the terms prov:wasDerivedFrom, 
prov:wasAttributedTo, and :jobTitle, it also strictly fits with the 
formal semantics in  the RDF-star spec, and finally uses a well 
established model of provenance.

Now, again, you could have a property that is equivalent to the 
composition of prov:wasDerivedFrom and prov:wasAttributedTo, and call it 
":accordingTo". But you would have to describe it as such. In your post, 
nothing says that :accordingTo is intended to mean a triple derived from 
a claim attributed to a person. Also, the name :accordingTo is 
misleading, as it does not suggest that it is about a triple of RDF terms.

If, instead, you had described :accordingTo in a precise way that agrees 
with the semantics, it would have led to a more complex and confusing 
explanation. If, as I believe, one of the aims of the blog post is to 
point the audience of RDF-star to the simplicity of the model, having a 
complex explanation is detrimental to the attractiveness of RDF-star.

In fact, it is not even very clear to me what ought to be the intended 
meaning of ":accordingTo" if it was to be compliant with the formal 
semantics. Is it, as I suggest with the example above, that :emp22 made 
a claim, and the claim is encoded as the triple <<:emp38 :jobTitle 
"Assistant Designer" .>>? Or is it that :emp22 used/generated the triple 
somehow, regardless of whether they actually believe or claim the 
underlying statement?


If I had to express provenance of RDF data using RDF-star, I would use 
the PROV model as I did above. However, if I merely wanted to say 
"Employee 22 claims that Employee 38's job title is 'Assistant 
Designer'", I would rather use something like:

@prefix s: <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject> .
@prefix p: <http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate> .
@prefix o: <http://www.w3.org/1999/02/22-rdf-syntax-ns#object> .
@prefix : <http://example.com/> .
[s: :emp38; p: :jobTitle; :o "Assistant Designer"] :accordingTo :emp22

which makes the claim unrelated to the URIs used, or any syntax for that 
matter.


--AZ


PS: "quoted triples" in RDF-star are very much like triples that are 
quoted, but not exactly. You cannot properly quote triples with blank 
nodes. This has strange consequences.

> 
>>
>>
>>
>> Best,
>> --AZ
>>
>>
>>
>>
>>
>> Le 26/01/2022 à 21:34, Pierre-Antoine Champin a écrit :
>>> Dear all,
>>>
>>> following a discussion during our two last calls, I published a post 
>>> about "Provenance in RDF-star":
>>>
>>> https://www.w3.org/community/rdf-dev/2022/01/26/provenance-in-rdf-star/
>>>
>>> quoting the intro:
>>>
>>>  > In this post, we present some lessons learned by the group through 
>>> discussions and exchanges. This is meant to give some insight about 
>>> the rationale behind RDF-star, and some guidelines about how to best 
>>> use it for modeling provenance data.
>>>
>>> Many thanks to all the participants of the RDF-star group for their 
>>> reviews and feedback on this post.
>>>
>>>    pa
>>>
>>
>>


-- 
Antoine Zimmermann
ISI - Institut Henri Fayol
École des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
https://www.emse.fr/~zimmermann/
Received on Friday, 4 February 2022 10:23:11 UTC