Re: Adaptation of the semantics from Pierre-Antoine Champin on 2021-03-19 (public-rdf-star@w3.org from March 2021)

From: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>
Date: Fri, 19 Mar 2021 17:35:39 +0100
To: public-rdf-star@w3.org
Message-ID: <fe5c78f1-b17b-d3c8-247e-ee5f03ec72b6@ercim.eu>
(I realise that I originally only included Peter in my reply;
  reposting my reply here for archiving; sorry Peter for the duplicate)

Dear Peter,

On 11/03/2021 23:55, Peter F. Patel-Schneider wrote:
 > Turtle* processing can be simple in just about any scheme if the 
output is an RDF* graph. There is no need for a Turtle* system to 
canonicalize RDF* graphs on input, nor is there are need for Turtle* 
systems to canonicalize on output.
 >
 > If it is possible to encode an embedded predicate in Turtle* then it 
is possible to have two structurally different Turtle* documents for the 
same RDF* graph.  But this is already possible in RDF - consider all the 
shorthands, particularly for containers and collections.
I should have been more careful with the wording. My point was indeed 
not about *parsing* Turtle*, but more about *storing* the result of this 
parsing in an RDF-star-aware system.
 >
 > It is true that a complete triple* store becomes somewhat more 
complex in this case, but only if the system wants to store all embedded 
triples directly.  Storing embedded triples directly is likely to help 
in SPARQL* performance, but any complete triple* store (let alone a 
performative one) is already quite complex so a bit more complexity 
isn't going to be noticeable.

I am not so sure about that. From what I understand, most (if not all) 
implementers of RDF* chose to represent embedded triples directly. I'm 
guessing there is a reason.

 >
 > So I don't see that the claim of making Turtle* processing easier is 
correct.
 >
 > In my mind important properties are what results are returned, not 
the form of these results.  Here is where hidden predicates have a role. 
   A basic question, I think, is whether it is possible for an RDF graph 
to entail an RDF* graph containing an embedded triple.

In the current semantics: yes, since the mapping defines an *equivalent* 
RDF graph for any RDF* graph.

In the new proposal: yes, since G1 U G2 entails G (where (G1, G2) = 
unstar(G)). However, no RDF graph can be *equivalent* to G (if G 
contains embedded triples).

 >
 > A slightly less basic question is whether SPARQL* results can return 
RDF triples that arise from embedded triples

But why should any RDF triple necessarily arise from embedded triples?? 
According to the abstract syntax, they are terms, just like, e.g., 
datatyped literals. Their internal structure should not have to "leak" 
into the graph (in simple entailment, at least).

The current semantics uses such "leaking" triples for the purpose of 
defining semantics, and tries to keep them hidden (using hidden 
predicates). But as you know, it fails to hide them entirely in SPARQL. 
Adding ad-hoc constraints in SPARQL-star to ensure that hidden 
predicates remain hidden sounds like an ugly solution (and by the way, 
we would be back to using two hiding operations: secret IRIs, and 
special SPARQL processing of those).

In the proposed semantics, the 7 "reification triples" entail the 
embedded triple, but not the other way around, so there is not need to 
hide them, neither in serialization syntaxes (hidden IRIs) nor in SPARQL 
results (by creating special cases). I find this much cleaner.

   pa

PS: I still have to use one hidden IRI, I explained the rationale in my 
previous email. I agree that it is not ideal. And actually, we can 
separate the two issues. I

 > even if no RDF graph can entail an RDF* graph containing an embedded 
triple. If the answer is no, then hidden triples are not allowed because 
the pattern ?s ?p ?o. returns all triples, unless there is special 
treatment of these predicates.  It might be reasonable to just make 
these hidden triples not be targets for matching by query variables. 
(One problem with this is that it doesn't handle generalized RDF triples 
with blank nodes as predicates.)
 >
 > peter
 >
 >
 > On 3/11/21 1:34 PM, Pierre-Antoine Champin wrote:
 >> On 11/03/2021 18:28, Peter F. Patel-Schneider wrote:
 >>
 >>> I don't see this as much of an advance.
 >>>
 >>>
 >>> The semantics now uses two graphs and also uses a hidden predicate.
 >>>
 >>>
 >>> As far as I can see, the reason to use a hidden predicate is to 
prevent RDF* graphs with embedded triples from beng equivalent to an RDF 
graph.  I'm not in favour of this attribute of RDF*, as it prevents the 
construction of variations on embedded triples.
 >> Note that the new proposal does not use hidden predicate to describe 
the internal structure of an embedded triple. So you can now builld 
something that is very similar to (the "expanded description of) an 
embedded triple -- how is it not a "variation on embedded triples"?
 >>
 >> What you can not do is to assemble a "proper" embedded triple by 
using those predicates. In my view, this is a feature of RDF-star, not a 
bug. Similarly, [ rdf:value "42"; rdf:type xsd:integer ] is quite 
similar to the literal "42"^^xsd:integer, but it is not that literal. 
Requiring RDF systems to recognized that literal whenever it is 
described like that would be a big hurdle for implementers.
 >>
 >>> As far as I can see, the reason to use two graphs is to hide the 
triples generated by embedded triples from SPARQL.
 >>>
 >>> So there are two hiding operations.  Are two really needed?
 >>
 >> You are right, in a way they are two hiding operations. But they 
play different roles.
 >>
 >> * using 2 graphs instead of one avoids embedded triples to entail 
their "expanded" representation (i.e. the 7 triples generated in G2 for 
that embedded triple) -- that way, SPARQL-star engines (but also 
RDF-star systems supporting simple entailment) do not need to construct 
the "expanded" representation for every embedded triple they contain.
 >>
 >> * using a hidden IRI avoids the "expanded" representation to entail 
the embedded triple (more precisely, the complete expanded 
representation DOES entail the embedded triple, but it can not be 
serialized) -- that way, Turtle-star parsers are not required to analyze 
all "reification" triples to re-construct "expanded".
 >>
 >> So to sum it up, the first hiding operation is for the sake of 
SPARQL-star, the second one is for the sake of Turtle-star.
 >>
 >>   pa
 >>
 >>>
 >>>
 >>> peter
 >>>
 >>>
 >>>
 >>> On 3/5/21 1:00 PM, Pierre-Antoine Champin wrote:
 >>>>
 >>>> Hi all,
 >>>>
 >>>> I just pushed a pull-request adapting the semantics:
 >>>>
 >>>> https://github.com/w3c/rdf-star/pull/127
 >>>>
 >>>> I believe it has some advantages over the current version:
 >>>>
 >>>>   * it does not rely anymore on "hidden" predicates (see issue #101
 >>>>     <https://github.com/w3c/rdf-star/issues/101>)
 >>>>   * it does not have the "merging" issue warned about in §6.3.1
 >>>> 
<https://w3c.github.io/rdf-star/cg-spec/2021-02-18.html#combining-rdf-star-graphs>
 >>>>   * I think that it allows us to align SPARQL query semantics with 
simple
 >>>>     entailment (as newly defined)
 >>>>   * I think that it allows the Interpolation Lemma
 >>>> <https://www.w3.org/TR/rdf11-mt/#dfn-interpolation> to extend to 
RDF-star
 >>>>
 >>>> (I didn't formally prove the last two items, hence "I think"...)
 >>>>
 >>>> The trick is that we do not map anymore RDF-star graphs to a 
single, semantically equivalent RDF graph.
 >>>> Instead, we map it to a pair of RDF graphs, which can be thought 
of as a "lower and upper bound" of the RDF-star graph, in terms of 
entailment. The semantics of the RDF-star graph is defined through the 
semantics of its "bounds", reusing RDF semantics as is (as we currently do).
 >>>>
 >>>> In this new semantics, a strict RDF-star graph (i.e. one that 
contains embedded triples) has no exactly equivalent RDF graph, so it 
still can not be conveyed exactly using RDF syntaxes (but we do not rely 
anymore on hidden predicates for that). However, either of the two 
"bounds" can be used to approximate the RDF-star graph in legacy RDF. 
The "lower bound" will produce correct but incomplete inferences. The 
"upper bound" will produce complete inferences, with a few spurious (but 
generally harmless) ones.
 >>>>
 >>>> I am curious to get some feedback on this.
 >>>>
 >>>>
 >>>
 >>
 >
Received on Friday, 19 March 2021 16:35:44 UTC