Re: Adaptation of the semantics from Andy Seaborne on 2021-03-19 (public-rdf-star@w3.org from March 2021)

From: Andy Seaborne <andy@apache.org>
Date: Fri, 19 Mar 2021 18:04:50 +0000
To: public-rdf-star@w3.org
Message-ID: <9c251b8f-126d-a5ab-8fc3-5bae7ea1577e@apache.org>
On 19/03/2021 16:35, Pierre-Antoine Champin wrote:
> (I realise that I originally only included Peter in my reply;
>   reposting my reply here for archiving; sorry Peter for the duplicate)
> 
> Dear Peter,
> 
> On 11/03/2021 23:55, Peter F. Patel-Schneider wrote:
>  > Turtle* processing can be simple in just about any scheme if the 
> output is an RDF* graph. There is no need for a Turtle* system to 
> canonicalize RDF* graphs on input, nor is there are need for Turtle* 
> systems to canonicalize on output.
>  >
>  > If it is possible to encode an embedded predicate in Turtle* then it 
> is possible to have two structurally different Turtle* documents for the 
> same RDF* graph.  But this is already possible in RDF - consider all the 
> shorthands, particularly for containers and collections.
> I should have been more careful with the wording. My point was indeed 
> not about *parsing* Turtle*, but more about *storing* the result of this 
> parsing in an RDF-star-aware system.
>  >
>  > It is true that a complete triple* store becomes somewhat more 
> complex in this case, but only if the system wants to store all embedded 
> triples directly.  Storing embedded triples directly is likely to help 
> in SPARQL* performance, but any complete triple* store (let alone a 
> performative one) is already quite complex so a bit more complexity 
> isn't going to be noticeable.
> 
> I am not so sure about that. From what I understand, most (if not all) 
> implementers of RDF* chose to represent embedded triples directly. I'm 
> guessing there is a reason.

Jena has embedded triple RDF terms. I think they are unavoidable - the 
tokenizer emits tokens, one kind of token RDF term, and the parsers emit 
asserted triples.

It could index embedded triples - the goal so far has been "no 
persistent storage change". Changing on-disk formats is a significant step//

Currently it does not index embedded triples themselves (it used to for 
a PG-like model), until the need is there but as it already has the 
machinery for triple indexes so the work is not that large.

Jena does not directly store triples in persistent storage. It has 
indexes (3 or 4 tuples of fixed length internal ids). I don't think it 
is that unusual.

In memory, it stores triples/quads but then it can use pointers.

     Andy

> 
>  >
>  > So I don't see that the claim of making Turtle* processing easier is 
> correct.
>  >
>  > In my mind important properties are what results are returned, not 
> the form of these results.  Here is where hidden predicates have a role. 
>    A basic question, I think, is whether it is possible for an RDF graph 
> to entail an RDF* graph containing an embedded triple.
> 
> In the current semantics: yes, since the mapping defines an *equivalent* 
> RDF graph for any RDF* graph.
> 
> In the new proposal: yes, since G1 U G2 entails G (where (G1, G2) = 
> unstar(G)). However, no RDF graph can be *equivalent* to G (if G 
> contains embedded triples).
> 
>  >
>  > A slightly less basic question is whether SPARQL* results can return 
> RDF triples that arise from embedded triples
> 
> But why should any RDF triple necessarily arise from embedded triples?? 
> According to the abstract syntax, they are terms, just like, e.g., 
> datatyped literals. Their internal structure should not have to "leak" 
> into the graph (in simple entailment, at least).
> 
> The current semantics uses such "leaking" triples for the purpose of 
> defining semantics, and tries to keep them hidden (using hidden 
> predicates). But as you know, it fails to hide them entirely in SPARQL. 
> Adding ad-hoc constraints in SPARQL-star to ensure that hidden 
> predicates remain hidden sounds like an ugly solution (and by the way, 
> we would be back to using two hiding operations: secret IRIs, and 
> special SPARQL processing of those).
> 
> In the proposed semantics, the 7 "reification triples" entail the 
> embedded triple, but not the other way around, so there is not need to 
> hide them, neither in serialization syntaxes (hidden IRIs) nor in SPARQL 
> results (by creating special cases). I find this much cleaner.
> 
>    pa
> 
> PS: I still have to use one hidden IRI, I explained the rationale in my 
> previous email. I agree that it is not ideal. And actually, we can 
> separate the two issues. I
> 
>  > even if no RDF graph can entail an RDF* graph containing an embedded 
> triple. If the answer is no, then hidden triples are not allowed because 
> the pattern ?s ?p ?o. returns all triples, unless there is special 
> treatment of these predicates.  It might be reasonable to just make 
> these hidden triples not be targets for matching by query variables. 
> (One problem with this is that it doesn't handle generalized RDF triples 
> with blank nodes as predicates.)
>  >
>  > peter
>  >
>  >
>  > On 3/11/21 1:34 PM, Pierre-Antoine Champin wrote:
>  >> On 11/03/2021 18:28, Peter F. Patel-Schneider wrote:
>  >>
>  >>> I don't see this as much of an advance.
>  >>>
>  >>>
>  >>> The semantics now uses two graphs and also uses a hidden predicate.
>  >>>
>  >>>
>  >>> As far as I can see, the reason to use a hidden predicate is to 
> prevent RDF* graphs with embedded triples from beng equivalent to an RDF 
> graph.  I'm not in favour of this attribute of RDF*, as it prevents the 
> construction of variations on embedded triples.
>  >> Note that the new proposal does not use hidden predicate to describe 
> the internal structure of an embedded triple. So you can now builld 
> something that is very similar to (the "expanded description of) an 
> embedded triple -- how is it not a "variation on embedded triples"?
>  >>
>  >> What you can not do is to assemble a "proper" embedded triple by 
> using those predicates. In my view, this is a feature of RDF-star, not a 
> bug. Similarly, [ rdf:value "42"; rdf:type xsd:integer ] is quite 
> similar to the literal "42"^^xsd:integer, but it is not that literal. 
> Requiring RDF systems to recognized that literal whenever it is 
> described like that would be a big hurdle for implementers.
>  >>
>  >>> As far as I can see, the reason to use two graphs is to hide the 
> triples generated by embedded triples from SPARQL.
>  >>>
>  >>> So there are two hiding operations.  Are two really needed?
>  >>
>  >> You are right, in a way they are two hiding operations. But they 
> play different roles.
>  >>
>  >> * using 2 graphs instead of one avoids embedded triples to entail 
> their "expanded" representation (i.e. the 7 triples generated in G2 for 
> that embedded triple) -- that way, SPARQL-star engines (but also 
> RDF-star systems supporting simple entailment) do not need to construct 
> the "expanded" representation for every embedded triple they contain.
>  >>
>  >> * using a hidden IRI avoids the "expanded" representation to entail 
> the embedded triple (more precisely, the complete expanded 
> representation DOES entail the embedded triple, but it can not be 
> serialized) -- that way, Turtle-star parsers are not required to analyze 
> all "reification" triples to re-construct "expanded".
>  >>
>  >> So to sum it up, the first hiding operation is for the sake of 
> SPARQL-star, the second one is for the sake of Turtle-star.
>  >>
>  >>   pa
>  >>
>  >>>
>  >>>
>  >>> peter
>  >>>
>  >>>
>  >>>
>  >>> On 3/5/21 1:00 PM, Pierre-Antoine Champin wrote:
>  >>>>
>  >>>> Hi all,
>  >>>>
>  >>>> I just pushed a pull-request adapting the semantics:
>  >>>>
>  >>>> https://github.com/w3c/rdf-star/pull/127
>  >>>>
>  >>>> I believe it has some advantages over the current version:
>  >>>>
>  >>>>   * it does not rely anymore on "hidden" predicates (see issue #101
>  >>>>     <https://github.com/w3c/rdf-star/issues/101>)
>  >>>>   * it does not have the "merging" issue warned about in §6.3.1
>  >>>> 
> <https://w3c.github.io/rdf-star/cg-spec/2021-02-18.html#combining-rdf-star-graphs> 
> 
>  >>>>   * I think that it allows us to align SPARQL query semantics with 
> simple
>  >>>>     entailment (as newly defined)
>  >>>>   * I think that it allows the Interpolation Lemma
>  >>>> <https://www.w3.org/TR/rdf11-mt/#dfn-interpolation> to extend to 
> RDF-star
>  >>>>
>  >>>> (I didn't formally prove the last two items, hence "I think"...)
>  >>>>
>  >>>> The trick is that we do not map anymore RDF-star graphs to a 
> single, semantically equivalent RDF graph.
>  >>>> Instead, we map it to a pair of RDF graphs, which can be thought 
> of as a "lower and upper bound" of the RDF-star graph, in terms of 
> entailment. The semantics of the RDF-star graph is defined through the 
> semantics of its "bounds", reusing RDF semantics as is (as we currently 
> do).
>  >>>>
>  >>>> In this new semantics, a strict RDF-star graph (i.e. one that 
> contains embedded triples) has no exactly equivalent RDF graph, so it 
> still can not be conveyed exactly using RDF syntaxes (but we do not rely 
> anymore on hidden predicates for that). However, either of the two 
> "bounds" can be used to approximate the RDF-star graph in legacy RDF. 
> The "lower bound" will produce correct but incomplete inferences. The 
> "upper bound" will produce complete inferences, with a few spurious (but 
> generally harmless) ones.
>  >>>>
>  >>>> I am curious to get some feedback on this.
>  >>>>
>  >>>>
>  >>>
>  >>
>  >
> 
>
Received on Friday, 19 March 2021 18:05:06 UTC