Re: Blog post about "Provenance in RDF-star" from Pierre-Antoine Champin on 2022-02-19 (public-rdf-star@w3.org from February 2022)

From: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>
Date: Sat, 19 Feb 2022 19:06:26 +0100
To: antoine.zimmermann@emse.fr
Cc: "public-rdf-star@w3.org" <public-rdf-star@w3.org>
Message-ID: <70a32417-e159-6b24-74cf-9d132c5ad450@ercim.eu>
Antoine,

On 15/02/2022 17:32, Antoine Zimmermann wrote:
> Pat,
>
> The fact that something is beside the point depends on one's opinion 
> and perspective, but it also depends on what is "the point".
>
> Here it seems you are arguing about a different point. I completely 
> agree with the fact that there are different ways of modelling or 
> designing a representation for many things, and many of them are 
> valid. I never pretended otherwise. Your examples are certainly 
> relevant (atom/city/etc), but again, this is beside the point I'm 
> trying to make.
>
> My point is the following: if someone defines a property "foaf:name" 
> and describes its intended meaning as "the relation between a person 
> and their name", followed by an example such as:
>
> az:me  foaf:name  "Antoine Zimmermann" .
>
> then it comes to no one's surprise that a string literal is used in 
> the object position. Other models are valid, for sure, where an IRI is 
> used to identify a name, and more details about the names can be 
> expressed. But at least, the choice of a literal here is not regarded 
> as a bad design.

... by *most* people, because it does not break their use cases. It is 
bad design in the (admittedly niche) use-case of linguists or 
terminologists for whom "names" are complex entities distinct from their 
textual encoding.

>
> However, if someone defines a property "foaf:knows" and describes its 
> intended meaning as "a relation between a person and someone they 
> know", followed by an example such as:
>
> az:me  foaf:knows  "Pat Hayes" .
>
> then it is legitimate to consider that the example is mistaken.

... again, by *most* people, because it will break quickly in many 
common use-cases. However, conversely to the previous example, I believe 
there are a few situations where this design is sufficient, and will not 
break (because the needs are simple enough).

You seem to consider that there is a crisp and absolute boundary between 
bad design and good design. I claim that its is instead relative and blury.

Granted, the "knows as String" example above is *very* brittle (it will 
almost certainly break). And granted also, it would much more 
appropriate, in in that case, to define the property as "a relation 
between a person and THE NAME OF someone they know". Which leads me to 
the following comment...

> This kind of confusion between a casual name and an identifier is 
> regarded as a mistake in databases, in object oriented programming, 
> and in modelling knowledge in general. If someone has a relational 
> table "Employee", with a numeric primary key, and there is a column 
> "manager", where manager is intended to identify an employee that 
> manages (or supervise) the given employee, then it is a mistake to use 
> a String. One should use a foreign key that refers to the primary key 
> of the Employee table.
> Similarly, if there is a class Employee in an object oriented 
> programme, and one wants to introduce an attribute "knows" that 
> connects the employees to the list of people they know, it is a bad 
> design to define "knows" as an array of strings. One should use an 
> array (or hashSet, or List, or any kind of collection) of objects of 
> type Employee.
> Any student who is learning DBMS or OOP would be considered mistaken 
> if they used strings to model the reference to employees.
>
> Following the same idea, I claim that: if Pierre-Antoine defines the 
> property "ex:accordingTo" as intended to represent a relation between 
> a claim and a person,
The blog post does not contain any such description of 
'ex:accordingTo'.  On the contrary, the blog post shows how this designs 
may eventually break, and provides an alternative where the distinction 
between 'statement' and 'claim' is explicitly discussed.
> followed by the example:
>
> <<:empl38 :jobTitle "Assistant Designer">> ex:accoridngTo :empl22 .
>
> then there is a mistake in the example (or in the model). The RDF-star 
> node <<:empl38 :jobTitle "Assistant Manager">> denotes a structure 
> that involves the specific URIs ":empl38" and ":jobTitle", and 
> involves the specific string literal "\"Assistant 
> Manager\"^^xsd:string", which can arguably be considered a bad way of 
> modelling a claim.
Yes, and this is what the whole second half of the post is discussing.
>
>
> But in the case of RDF-star, the fact that a quoted triple does not 
> denote a claim in the broad sense of the term [*] is hidden in the 
> formal definition of the semantics that few people will be diligent 
> enough to check (See 
> https://w3c.github.io/rdf-star/cg-spec/editors_draft.html#rdf-star-semantics). 


Agreed. And one of the main point of this post was to highlight this 
issue, in a hopefully more accessible way.


> This kind of blog post that Pierre-Antoine authored may lead to 
> inconsiderate usage of the RDF-star model, and therefore, I consider 
> it to be misleading.
>
>
> The problem in RDF-star documentation and literature, is that this 
> confusion between merely being a claim, and being a thing attached to 
> specific URIs and literals, is maintained throughout all examples.

I understand that we may disagree on the quality of the first examples 
in the post (intrinsically bad vs. good enough for some use-cases), but 
I don't think it is fair to say that "this confusion (...) is maintained 
throughout all examples".

Quoting the blog post: "The problem with the last example above is that 
we are not talking about the triple |<< :employee38 :jobTitle “Assistant 
Designer” >>| (which is uniquely identified by its subject, predicate 
and object). We want to talk about two similar but distinct *claims*, 
each claim with its own identity, and its own properties." Followed by 
*other examples* where claims are explicitly modelled, as a distinct 
entity from the quoted triple.

     pa


> Modelling the claims themselves (and not the RDF representation of 
> them) is a desirable feature for many use cases. So, if RDF-star is 
> presented as able to address these use cases, a lot of people may be 
> attracted to it with the wrong assumptions on what RDF-star can truly 
> represent.
>
>
> --AZ
>
>
> [*] To be precise, a quoted triple denotes something that is related 
> to a specific predicate IRI, at least, and possibly a subject IRI, an 
> object IRI, and/or an object literal. So, a quoted triple, rather than 
> denoting a claim in the common understanding, denotes "a claim 
> attached to a certain predicate IRI". This nuance is hidden in the 
> documentation.
>
>
>
> Le 07/02/2022 à 20:32, Patrick J. Hayes a écrit :
>>
>>
>>> On Feb 4, 2022, at 2:22 AM, Antoine Zimmermann 
>>> <antoine.zimmermann@emse.fr <mailto:antoine.zimmermann@emse.fr>> wrote:
>>>
>>> ... most people, I believe, would accept that a name can be a 
>>> character string (and vice versa). ... We identify names and 
>>> character strings all the time, and it is fine. If you are working 
>>> in the field of lexicography and philology, you may want to identify 
>>> words, word representations, word senses, etc. with individual URIs, 
>>> but I'd say it is beside the point.
>>
>> But we can say the same about many topics, perhaps all of them.
>>
>> Most people would accept that one atom of an element is just like 
>> another. We do that all the time, and it is fine. If you are working 
>> in the field of nuclear chemistry you may want to identify different 
>> isotopes, such a C-14 and C-12, with individual URIs, but I'd say it 
>> is beside the point.
>>
>> Most people would accept that a city is just a place. We do that all 
>> the time, and it is fine. If you are working in the field of civic 
>> administration you may want to identify different administrative 
>> entities, such as the City of London and Greater London, with 
>> individual URIs, but I'd say it is beside the point.
>>
>> And so on.
>>
>> "I'd say it is beside the point" simply means "I am not interested in 
>> that particular distinction and am happy to ignore it", but of course 
>> others may differ. Whenever you take that position you implictly 
>> exclude some users, by making it impossible for them to speak of what 
>> interests them.
>>
>> Pat
>
Attachments

application/pgp-keys attachment: OpenPGP public key
Received on Saturday, 19 February 2022 18:06:32 UTC