Re: Consolidating triple/edges from Thomas Lörtsch on 2023-12-17 (public-rdf-star-wg@w3.org from December 2023)

From: Thomas Lörtsch <tl@rat.io>
Date: Mon, 18 Dec 2023 00:06:00 +0100
To: Andy Seaborne <andy@apache.org>
Cc: RDF-star Working Group <public-rdf-star-wg@w3.org>
Message-Id: <BC0836C3-7AA1-454F-BDBF-110952D05E99@rat.io>
> On 15. Dec 2023, at 13:57, Andy Seaborne <andy@apache.org> wrote:
> 
> Thomas,
> 
> Responding to some more of your points:
> 
> On 14/12/2023 16:46, Thomas Lörtsch wrote:
>> In principal I agree, although I have a few things to add and modify. Basically, if we decide to go the pragmatic route and standardize only an LPG-oriented subset of annotation functionality, we still have to be sure that we don’t block future extensions to a complete solution. That requires us to think many things through (like graphs, quotation). However, actually standardizing those other things is then not much extra work.
>> A few comments inline, a more coherent take at the end.
>>> On 12. Dec 2023, at 21:59, Andy Seaborne <andy@apache.org> wrote:
>>> 
>>> Here is an attempt to write out the details of what I think has been said recently.
>>> 
>>> It is addressing "publishing information about multi-edges".
>>> (Ideas here are from WG members - the mistakes are mime.)
>>> 
>>> 
>>> Multiple edges with the same label are handled as multiple occurrences - the predicate URI of the RDF triple is thought as a conceptual relationship - with multiple sets of annotations.
>>> 
>>> This preserves the uniqueness of triples in a graph, and allows independent collections of assertions about a relationship. Such collections of assertions do not get entangled on merge.
>> Just to emphasize that there is one more important differentiation to make: does the occurrence identify a specific triple or does it refer to the abstract statement.
> 
> It is the (potential) usage of the triple. I think this was called a "claim" as contrasted to an "assertion" (i.e. a fact) in early RDF (~1.0) discussions.
> 
> Triples (abstract/type) _occur_ or "are used" in graphs and that is that usage/occurrance that is being annotated separately from the universal concept the triple represents.
> 
> There are maybe better names than "occurrence" that could name the concept better. Previous, "usage", or "mention" have come up.

I mean something different, not philosophical. There are two ways to identifiy an occurrence/token/claim:

direct, e.g      :s :p :o  {| id:1 | :said :Alice ; :on :Monday } 
                 :s :p :o  {| id:2 | :said :Bob ; :on :Tuesday } 

indirect, e.g.   :s :p :o .
                 << id:1 | :s :p :o >> :said :Alice ; :on :Monday .
                 << id:2 | :s :p :o >> :said :Bob ; :on :Tuesday . 

I think it’s obvious that there is a difference. However this difference is only in syntax, not in formalization. The way all of RDF and also RDF-star is defined is that the asserted triple is stored only once and the relation between the triple and its annotation(s) is an indirect one. The consequence is that updates on annotated statements, especially deletes, will have to check if the statement is annotated multiple times. That seems doable and also a pragmatic design, since early de-duplication benefits querying (and deletes should be much less frequent than lookups). 

I did check the relevant use case 3.1 in the OneGraph [3] paper again and assured myself that with asserted triples direct and indirect identification are both good enough to disambiguate annotations on different tokens, even if the different tokens are only represented as types in the database.


However, add unasserted triples to the mix and the situation gets more complex. Imagine we merge data from two sources:

# source 1
              :s :p :o .
               << id:1 | :s :p :o >> :said :Alice ; :on :Monday .

# source 2
               << id:2 | :s :p :o >> :said :Bob ; :on :Tuesday . 

# merge 1+2
              :s :p :o .
               << id:1 | :s :p :o >> :said :Alice ; :on :Monday .
               << id:2 | :s :p :o >> :said :Bob ; :on :Tuesday . 

The merge loses the information that source 2 didn’t assert the statement it annotates. One might be tempted to think that the shorthand syntax (the direct version above) would disambiguate the two cases, but everything has to map to the explicit term syntax used in the second examples, so that’s not a viable solution.

Since I doubt that many people will read this mail and since I also began a similar discussion in an exchange with Olaf, I’m stopping here. However, I hope it’s clear now that there is a problem and while I know how to solve it with nested named graphs, I’m not sure yet how to tackle it with RDF-star.


>> That has ramifications in use cases for unasserted assertions (multiple annotations, some of them meant to refer to an unasserted statement), updates (delete the statement with the annotation, or is another annotation still refering to it) etc. 
> 
>> In more abstarct terms the question is if statements are understood as types already when authoring/storing or only later when querying/reasoning. It’s essentially a question of early vs late optimization, and RDF’s set semantics - while practical, fundamental to RDF and not to be impaired - is only a result of the former, nothing holy in itself. So working around it IS okay.
> 
> In the abstract data model of RDF, it's types.
> 
> Occurrences are a resource in the domain of discourse.
> They are literal-like in that they self-describe.

I don’t think so. In the Semantics Task Force call on Friday we pretty much all agreed that we want occurrences, expressed in RDF-star, to be referentially transparent (like they are e.g. in RDF standard reification). Then they are not literal-like but denote something in the realm of interpretation. Maybe you work with another interpretation of "literal-like"?

Also, my remark was very much practical - at what point de-duplication is performed - and not concerned with the semantics of RDF, which I don’t want to change. We all know that there is a gap between the abstract semantics and data model of RDF and the real world [4]. The question is which point is the best to bridge it.

> > ## SPARQL sugar
> >
> > You compare the occurence-based shortcut relation to syntactic sugar for RDF lists, which is fine, except that querying those lists is a hardship. Same for RDF/XML’s syntactic support for RDF standard reification. Any kind of RDF syntactic sugar also needs proper support in SPARQL to be effective in practice.
> 
> Yes - SPARQL needs syntax support.
> 
> The Turtle syntax would be replicated in SPARQL. "Turtle with holes".
> 
> As per the CG SPARQL-star approach, the fixed form of a triple term means accessor functions can be defined.
> 
>> ## Graph Terms vs Named Graphs
>> I like Adrians example [0] of a complicated named graph based application and I’m taking that serious. However it should also be clear that triple/graph terms in the end are always stored in a way very similar to named graphs. There is just no other way in a quad based system. Triple/graph terms can be represented as named graphs, named graphs can be represented as graph terms. It’s a practical question of how to encode belonging/membership: syntactically as nested graphs, via a new term type as in RDF-star that transforms a triple into a term at the surface (but NOT in the underlying storage layer, for obvious performance reasons)
> 
> FYI Jena stores triple terms in the term table, not in named graphs.
> Other systems do this as well.

I’m all ears for more details! I admit that I was too rash with my claim. However, more detailed information is not that easy to come by. I checked a recent paper by Ruben Taelman [5] about indexing strategies for quoted triples and I get the impression (read: I don’t fully understand without more effort) that all quoted triples are indexed into the same index). So information does leak if one doesn’t put up some extra effort. That for sure is possible, but it just as well is for a named graph based solution. 

I sure don’t fully grasp all technical detail, but the claim has been made by you and others that named graphs can’t be used to implement an annotation mechanism because they are already used for application-specific purposes. That claim is only made in rather sweeping ways. You provide almost no detail, Adrian has put some effort into describing scenarios which he sees as diffilcult. I answered him with some comments [6] and look forward to his response. You have yet to come up with anything substantial.

As I said in my response [7] to Ora on this issue, if we take the stance that it is too risky to use named graphs for sound modelling, then we better device some other solution. But there is nothing. With the same sweeping attitude that named graphs are dismissed, others dismiss graph terms, which could be an alternative, on equally shaky grounds.  

>> , via explicit binding relations as Niklas proposes [1] (and as Dydra implements nested graphs), etc. The main question is how to ensure that those binding relations don’t get lost in the process, but that IMHO is true for any solution. Nested graphs can be serialized to graph terms, which are just an extension of triple terms. That requires an additional en/de-coding step to fit them into an environment that reserves named graphs to its own purposes. That extra step is the price that those applications have to pay for being so particular about their use of named graphs. That’s only fair, and probably still economical for them.
>> ## Term types vs Datatypes
>> The most fundamental grievance with RDF-star is the introduction of a new term type when a new datatype of type RDF/TTL would suffice.
> 
> A way in which triple terms are not literals+datatype is that that a new datatype would make the triple term opaque. c.f. ""^^xsd:anyURI.

We could introduce one extra step in which we include the RDF data from an RDF literal. That inclusion can then also control the semantics under which the data is included: transparent, opaque, unasserted, etc. That’s how NNG does it.

Thomas


[3] https://content.iospress.com/articles/semantic-web/sw223273
[4] https://lists.w3.org/Archives/Public/public-rdf-wg/2011Feb/0060.html
[5] https://rubensworks.github.io/article-quoted-triples-index/
[6] https://lists.w3.org/Archives/Public/public-rdf-star-wg/2023Dec/0046.html
[7] https://lists.w3.org/Archives/Public/public-rdf-star-wg/2023Dec/0028.html



>    Andy
> 
>> Best,
>> Thomas
>> [0] https://lists.w3.org/Archives/Public/public-rdf-star-wg/2023Dec/0019.html
>> [1] https://lists.w3.org/Archives/Public/public-rdf-star-wg/2023Nov/0032.html
>>>    Andy
>>> 
>>> [1]
>>> https://lists.w3.org/Archives/Public/public-rdf-star-wg/2023Dec/0024.html
>>> 
>>> [2] https://w3c.github.io/rdf-concepts/spec/#section-triples
>>>    (as of 2023-12-10)
>>
Received on Sunday, 17 December 2023 23:06:16 UTC