Re: RDF* and conjectures from thomas lörtsch on 2021-09-24 (public-rdf-star@w3.org from September 2021)

From: thomas lörtsch <tl@rat.io>
Date: Fri, 24 Sep 2021 13:25:46 +0200
To: Fabio Vitali <fabio.vitali@unibo.it>
Cc: Andy Seaborne <andy@apache.org>, "public-rdf-star@w3.org" <public-rdf-star@w3.org>
Message-Id: <E0536929-84E5-4AF2-8212-2308E39DC2D9@rat.io>
> On 24. Sep 2021, at 11:19, Fabio Vitali <fabio.vitali@unibo.it> wrote:
> 
> Hello Thomas! 
> 
>> [b.t.w.: i find your intervention very productive. thank you for your enthusiasm :) ]
> 
> Thanks. This means a lot to me. Sometimes I fail to notice when I am becoming a nuisance or a bore, and statements like these are reassuring...
> 
> 
>> Am 23. September 2021 12:59:58 MESZ schrieb Fabio Vitali <fabio.vitali@unibo.it>:
>>> Dear Thomas, all. 
>>> 
>>> My own impression is that the semantics of named graphs is problematic in a way that cannot be fixed, and it is better to find a way out in a totally different way. 
>>> 
>>> We have a proverb where I come from, that says: if you glue back together a broken vase, you still have a broken vase. Dataset semantics is a broken vase. 
>>> 
>>> Personally, I see a clear parallelism with RDF reification, a complicated and pedantic mechanism for an important necessity, and although it has a clear semantics, but over time people have variously enforced or ignored it, so it is not reliably found in the wild. 
>> 
>> its syntax is  underspecified. its semantics very clearly define the reified triple as a referentially transparent occurrence, but it lacks a property to describe the location where that occurrence occurs. 
>> 
>>> So Olaf and you guys 
>> 
>> please don't count me in :-/ i never made a secret of the fact that i find an identifier like RDF/XMLs id-attribute much more practical than the verbose embedded statement for the purpose of statement annotation and it quite frankly is beyond me how RDF* gained so much support. 
> 
> My point of view is that this syntax or another are irrelevant as long as they concretely reduce the visual clutter and allow to rapidly and concisely express the relevant use cases. 
> 
> I am not a big fan of formal languages for general humans but this is where we are: I think that RDF linearizations should use labels to denote entities, rather than uris, which guarantee unambiguity at the expense of clarity, but this is a digression, I think, in this group. 

Given that this group bets on quoted triples, yes. I repeatedly tried to convice this group of the usefulness of short statement identifiers, maybe as syntactic sugar on top of the rather verbose quoted triple, but to no avail.

>> but statement annotation (especially with an eye towards property graph compatability, and competibility) is a whole different story than what the proposed semantics (and the original Named Graphs proposal, and N3, and you) focus on, and shoe horning one into the other won't work out if one is not very scrupulous - which the editors clearly aren't. 
>> 
>>> said: RDF reification is a broken vase, and even if we fix it we still are stuck with a broken vase. Instead, let's get a new and shiny vase and use it in addition with the old one for our purposes. 
>> 
>> RDF standard reification is just a vocabulary and therefor relatively easy to replace. named graphs OTOH use curly braces and we only have a very limited set of such delimiters available. we simply can't afford to waste them. ask Olaf how hard it was to come up with the chevron syntax. 
>> 
>> also named graphs are predominantly used with the semantics implicit in SPARQL. also there is a possibility to define a vocabulary that allows to declare the semantics of a dataset. in a world where it is possible to garner broad support for embedded triples just because they seem to bring sound and concise statement annotation (which if course now they don't, but they could) everything is possible... 
> 
> Not sure what you imply here. RDF*, at least in the communities I know of, is being advertised as the next big thing not because of what it is, but because it seems to be the best available thing to express what they have needed for a long time, i.e., I can assign a provenance to a statement regardless of whether I agree with the author of the statement. 

If the statement has an author then it is an occurrence of the type of statement. In that case you have to first declare a reference to the occurrence by using the tedious occurrence vocabulary. The syntax immediatly has much less appeal. 2 extra triples, vs 4 extra triples with standard reification. 

> Doing so without RDF* is a pain in the neck, and with RDF* is not ideal, but considerably better.

I don’t think that win is worth the effort but YMMV. But we could achieve a much better result with a slight modification of the syntax for example by adding an extra operator to enforce referential opacity, like in the << s p o * >> example that I gave (in my other mail to you from yesterday I think).

> At some point they'll notice that if they have individual triples, RDF* is perfect, but more complicated situations, such as attributing without asserting structures such as graphs (e.g. nanopublications) or sequences of triples (e.g. n-ary relationships) will require that they quote and assign provenance to every single triple, and they will not be happy. Happier than before RDF*, but not happy. 

For sets of triples you can use hacks like collecting them in a list. Not ideal, of course, but a workaround. If I interpret Pierre-Antoine's response to you right, he regards this as predominantly a political issue and I expect a push for quoted graphs once the quoted triples are standardized. Maybe. Hard to tell for sure.

> E.g.: 
> 
>  :assertion {
>    << wd:Q2 wdt:P571 "-4004-10-23"^^xsd:date>> :accordingTo wd:Q333481 . 
> 
>    // Ussher said that the Earth was created in 4004 bC
>  } 
> 
>  :provenance { 
>    :assertion prov:wasAttributedTo wd:Q333481 . 
>  } 
> 
>  :pubInfo {... }
> 
>  :Head {
>    : a np:Nanopublication .
>    : np:hasAssertion :assertion .
>    : np:hasProvenance :provenance .
>    : np:hasPublicationInfo :pubInfo .
>  }
> \
> This graph is far from ideal: Ussher is the author of the whole assertion, not one of the members of one of its triples. Yet, since with RDF* you cannot quote whole graphs, people will need to make do with this. Similarly: 
> 
>    :Hamlet crm:P94i_was_created_by :CreationOfHamlet. 
> 
>    :CreationOfHamlet a crm:E65_Creation. 
>    <<:CreationOfHamlet crm:P14_carried_out_by :WilliamShakespeare>> :accordingTo :SamuelJohnson .
>    <<:CreationOfHamlet crm:P4_has_time-span :Year1603>> :accordingTo :SamuelJohnson .
>    <<:CreationOfHamlet crm:P215_has_reliability :High>> :accordingTo :SamuelJohnson .
> 
> If I want to attribute without accepting the triples of an n-ary relationship, I need to quote every single triple, because, unless we quoting to graphs, there is no single way to wrap them and quote them all without at the same time stating them. 
> 
> Better than before, but... it is possible to improve. 
> 
>>> So with RDF*  we now have 
>>> 1a) a nice and compact syntax for stated triples s p o 
>>> 1b) which corresponds, for those so inclined, to _:x a rdf:Statement; rdf:Subject s; rdf:Predicate p; rdf:Object o.   
>>> and  
>>> 2a) a nice and compact syntax for non-stated triples <<s p o>> 
>>> 2b) which corresponds, for those so inclined, to _:x unstar:Subject s; unstar:Predicate p; unstar:Object o, etc..   
>>> 
>>> Things are clear, the truth state of quoted triples in RDF* is clearly non asserted, and it is impossible to confound the two types of statements, neither in syntax nor in semantics. Nice and clean: if there is a doubt in interpretation, create something new so different from the old that there is no way to mixing them up again. Good. 
>> 
>> i agree that this syntactic feature is very important for usability. but as i said in my other mail: it is not new anymore, it is already defined in practice as referentially opaque occurrence. the cow paths are laid out, no matter if one thinks that's good or not. 
> 
> I am not sure I follow: you do not like RDF* to be referentially opaque and would have preferred it to be referentially transparent? 

Absolutely, because almost all use cases call for referential transparency - not at all surprising given the nature of the semantics of the semantic web. RDF-star orginally way not conceived as a syntax to enable unasserted assertions but to annotate statements. The embedded triple is the hinge between a triple and its annotation. The triple and its annotation have referential transparent semantics, just like every other standard piece of RDF, but the connector, the hinge between them does not - because the proposed semantics doesn’t care for the estabished use cases but for a completely orthogonal concern. IMO this is a desaster - or better; a blunder - because this mess would be entirely avoidable.

> I have to understand the big deal of this. As a mathematician by background, I am used to think that names of unbounded variables can change at will, so if you need referential opacity you just use fresh variable names. 

RDF is foremost a data integration technology. To facilitate integration without prior consent of all parties on identifiers and vocabularies it differentiates between the syntactic reresentation of a name and what the name means. This is essential to enable such useful things as co-denotation via owl:sameAs and it is woven into the very basic fabric of the semantic web. Annotating quoted statements breaks that integration facility. E.g.:

:alice :buys :car .
<< :alice :buys :car >> :on :friday .
:car owl:sameAs :automobile .

If you use OWL reasoning you’ll get 

:alice :buys : automobile .

and you would probably expect that the following annotation also holds: 

<< :alice :buys :automobile >> :on :friday .

but it doesn’t because the proposed semantics defines the embedded triple as referentially opaque. And that IMO is a blunder. If you want this annotation to hold, sort of, you’ll have to jump through some not so intuitive hoops (and quite different ones, depending on if you want to annotate a type or an occurrence) as discussed in recently popular issues like https://github.com/w3c/rdf-star/issues/209, https://github.com/w3c/rdf-star/issues/169 and https://github.com/w3c/rdf-star/issues/200.

> Also, what Lois Lane THINKS of Clark Kent is immaterial. The individual we denote with Clark Kent, being the same individual we denote with Superman, can certainty fly. The problem is not in the specification of the opinion of Lois Lane, but in the statement :ClarkKent owl:sameAs :Superman, which should not be true [*]. Remove this triple from your dataset and everything comes naturally in its place. 

Yes, :ClarKent and :Superman have not even in the comic been portrayed as real persons, they are personas of an extraterrestrial being name Kal-El [0]. Consequently it’s Kal-El who can fly and he could fly wearing his Clark Kent disguise too if he wanted. But that’s not the point. 
We want to capture Louis Lane’s incomplete knowledge and her wrong assumptions while using the same identifiers that she uses to model our better knowledge. So we need identifiers to point to two different things depending on context. And as we are on the semantic web where everything has to be easy, minting new identifiers is not an option ;-) You sure are aware of fluents - that proposal is sound but never flew as it is just too cumbersome to author, read, query... So we contextualize. Or quote. Or create new sub-properties. This whole endeavour of which RDF-star is just the current incarnation is really a quest to find the most intuitive and concise abstraction. Won’t be over soon...

> Conjectures do not mess with identifiers of subjects and objects, they just create new predicates, so they are referentially transparent because they can't help being so. I need to find a realistic use case for why this is NOT good. 

You want to shield one part of your data from another part. Controlling when the data enters the realm of interpretation through quoting it is one way to do this, and the way the proposed RDF-star semantics chooses. My problem with this is that it forces every use of RDF-star as a mere annotation facility, that is not concerned with the need to separate worldviews etc, to make some very cumbersome unquoting moves that completely kill the simplicity promised by the syntax.

> Interested in your opinion on this. 

I read your initial long mail introducing your proposal but I haven’t really understood it in depth yet (so can’t provide helpful comments just now - hopefully later.


> Ciao
> 
> Fabio
> 
> ---
> 
> 
> [*] I would ontologically disagree with the assertion of this triple. Clark Kent and Superman are not the same individual. Superman wears a cape, Clark Kent wears glasses, etc. These characteristics need to be differentiated, e.g. by introducing an intermediate class, call it a Persona, a Disguise, a Portrayal, and the individual that we are talking about uses one Persona or the other in different moments. Flying is an attribute of the :Superman Persona, and not of the :ClarkKent one. Solved without introducing referential opacity. 
> 
> I somewhat believe that if you introduce an intermediate class to shield the individual, you solve 99% of the ontological disputes that exist around here. 

Yes, but you loose the usability feature of the semantic web’s naming mechanism - and that IMO would be a very big loss.


Thomas


[0] https://en.wikipedia.org/wiki/Superman


> ----
> 
>> also, syntactically Antoine Zimmermann's proposal to define an RDF literal datatype is even more convincing: nothing looks and feels more like a quote than a quote. implementing SPARQL for sucg a datatype shouldn't be too hard. 
>> 
>> sorry again for my terseness! 
>> 
>> ciao 
>> thomas 
>> 
>> 
>>> 
>>> Now let's come to named graphs. This situation is better from the syntactical point of view, since graph syntax is actually quite reasonable, but worse from the semantic point of view, since there is none accepted. 
>>> 
>>> This is a broken vase, and even if you manage to glue back all that is wrong in the current situation, it would still be a broken vase. Let's learn the lesson from RDF* and let's get a new and shiny vase and use it in addition with the old one for our purposes. 
>>> 
>>> Then we would have
>>> 1c) a nice and compact syntax for usual named graphs, whatever semantics you want to associate to them, as before (the broken vase)
>>> 2c) a nice and compact syntax for non-stated named graphs (the new vase)
>>> 
>>> Things would be clear, the truth state of quoted graphs would be clearly non asserted, and it would be impossible to confound the two types of graphs, neither in syntax nor in semantics. Nice and clean. 
>>> 
>>> This is what I wish to create: a new and shiny vase for graphs corresponding to the one that RDF* is becoming for reification. 
>>> 
>>> Ciao
>>> 
>>> Fabio
>>> 
>>> --
>>> 
>>>> On 22 Sep 2021, at 23:47, thomas lörtsch <tl@rat.io> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On 21. Sep 2021, at 19:08, Andy Seaborne <andy@apache.org> wrote:
>>>>> 
>>>>> The more appropriate text for RDF-star is probably that in
>>>>> "RDF 1.1 Concepts and Abstract Syntax"
>>>>> 
>>>>> 1.6 Working with Multiple RDF Graphs
>>>>> https://www.w3.org/TR/rdf11-concepts/#managing-graphs
>>>>> 
>>>>> and the definition:
>>>>> 
>>>>> 4. RDF Datasets
>>>>> https://www.w3.org/TR/rdf11-concepts/#section-dataset
>>>>> 
>>>>> which has the note:
>>>>> 
>>>>> """
>>>>> Despite the use of the word “name” in “named graph”, the graph name is not required to denote the graph. It is merely syntactically paired with the graph. RDF does not place any formal restrictions on what resource the graph name may denote, nor on the relationship between that resource and the graph.
>>>>> """
>>>> 
>>>> The failure of the RDF 1.1 WG to standardize a named graphs semantics is well known and the very reason for this soul searching expedition into the semantics of SPARQL as a normative practical force. 
>>>> 
>>>> As a co-editor of SPARQL 1.0 and 1.1 and a participant in the RDF 1.1 WG (and co-editor of TriG as I just noticed) and probably numerous other RDF-related standardization efforts you should be in a formidable position to shed some light on the question which model theoretic semantics might best describe the semantics of SPARQL. 
>>>> 
>>>> You might also comment on if the RDF 1.1 WG discussed standardizing a model theoretic semantics as close as possible to the operational semantics of SPARQL, if that was deemed impossible for technical or "political" (read: conflicts with vendor interests) reasons. 
>>>> 
>>>> These are just two ideas of how you could help flatten the knowledge differences in this CG.
>>>> 
>>>> Thomas
>>>> 
>>>> 
>>>>> 
>>>>> Andy
>>>>> 
>>>> 
>>>> 
>>> 
>
Received on Friday, 24 September 2021 11:26:14 UTC