Re: Adaptation of the semantics from Peter F. Patel-Schneider on 2021-03-21 (public-rdf-star@w3.org from March 2021)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Sun, 21 Mar 2021 06:20:00 -0400
To: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>, public-rdf-star@w3.org
Message-ID: <91031b94-aa84-61bc-d2a6-cdd43e48854f@gmail.com>
On 3/19/21 12:35 PM, Pierre-Antoine Champin wrote:
> (I realise that I originally only included Peter in my reply;
>  reposting my reply here for archiving; sorry Peter for the duplicate)
>
> Dear Peter,
>
> On 11/03/2021 23:55, Peter F. Patel-Schneider wrote:
> > Turtle* processing can be simple in just about any scheme if the output is 
> an RDF* graph. There is no need for a Turtle* system to canonicalize RDF* 
> graphs on input, nor is there are need for Turtle* systems to canonicalize 
> on output.
> >
> > If it is possible to encode an embedded predicate in Turtle* then it is 
> possible to have two structurally different Turtle* documents for the same 
> RDF* graph.  But this is already possible in RDF - consider all the 
> shorthands, particularly for containers and collections.
> I should have been more careful with the wording. My point was indeed not 
> about *parsing* Turtle*, but more about *storing* the result of this parsing 
> in an RDF-star-aware system.
> >
> > It is true that a complete triple* store becomes somewhat more complex in 
> this case, but only if the system wants to store all embedded triples 
> directly.  Storing embedded triples directly is likely to help in SPARQL* 
> performance, but any complete triple* store (let alone a performative one) 
> is already quite complex so a bit more complexity isn't going to be noticeable.
>
> I am not so sure about that. From what I understand, most (if not all) 
> implementers of RDF* chose to represent embedded triples directly. I'm 
> guessing there is a reason.
>
> >
> > So I don't see that the claim of making Turtle* processing easier is correct.
> >
> > In my mind important properties are what results are returned, not the 
> form of these results.  Here is where hidden predicates have a role.   A 
> basic question, I think, is whether it is possible for an RDF graph to 
> entail an RDF* graph containing an embedded triple.
>
> In the current semantics: yes, since the mapping defines an *equivalent* RDF 
> graph for any RDF* graph.
>
> In the new proposal: yes, since G1 U G2 entails G (where (G1, G2) = 
> unstar(G)). However, no RDF graph can be *equivalent* to G (if G contains 
> embedded triples).

I should have been more precise - A basic question, I think, is whether it is 
possible for an *accessible* RDF graph (i.e., one that users can create) to 
entail an RDF* graph containing an embedded triple. Using hidden IRIs makes 
this not possible.

>
> >
> > A slightly less basic question is whether SPARQL* results can return RDF 
> triples that arise from embedded triples
>
> But why should any RDF triple necessarily arise from embedded triples?? 
> According to the abstract syntax, they are terms, just like, e.g., datatyped 
> literals. Their internal structure should not have to "leak" into the graph 
> (in simple entailment, at least).
But why not?  There is no need for embedded triples to be a new thing in the 
semantics.
>
> The current semantics uses such "leaking" triples for the purpose of 
> defining semantics, and tries to keep them hidden (using hidden predicates). 
> But as you know, it fails to hide them entirely in SPARQL. Adding ad-hoc 
> constraints in SPARQL-star to ensure that hidden predicates remain hidden 
> sounds like an ugly solution (and by the way, we would be back to using two 
> hiding operations: secret IRIs, and special SPARQL processing of those).
>
> In the proposed semantics, the 7 "reification triples" entail the embedded 
> triple, but not the other way around, so there is not need to hide them, 
> neither in serialization syntaxes (hidden IRIs) nor in SPARQL results (by 
> creating special cases). I find this much cleaner.
>
>   pa
>
> PS: I still have to use one hidden IRI, I explained the rationale in my 
> previous email. I agree that it is not ideal. And actually, we can separate 
> the two issues. I
>
> > even if no RDF graph can entail an RDF* graph containing an embedded 
> triple. If the answer is no, then hidden triples are not allowed because the 
> pattern ?s ?p ?o. returns all triples, unless there is special treatment of 
> these predicates.  It might be reasonable to just make these hidden triples 
> not be targets for matching by query variables. (One problem with this is 
> that it doesn't handle generalized RDF triples with blank nodes as predicates.)
> >
> > peter
> >
> >
> > On 3/11/21 1:34 PM, Pierre-Antoine Champin wrote:
> >> On 11/03/2021 18:28, Peter F. Patel-Schneider wrote:
> >>
> >>> I don't see this as much of an advance.
> >>>
> >>>
> >>> The semantics now uses two graphs and also uses a hidden predicate.
> >>>
> >>>
> >>> As far as I can see, the reason to use a hidden predicate is to prevent 
> RDF* graphs with embedded triples from beng equivalent to an RDF graph.  I'm 
> not in favour of this attribute of RDF*, as it prevents the construction of 
> variations on embedded triples.
> >> Note that the new proposal does not use hidden predicate to describe the 
> internal structure of an embedded triple. So you can now builld something 
> that is very similar to (the "expanded description of) an embedded triple -- 
> how is it not a "variation on embedded triples"?
> >>
> >> What you can not do is to assemble a "proper" embedded triple by using 
> those predicates. In my view, this is a feature of RDF-star, not a bug. 
> Similarly, [ rdf:value "42"; rdf:type xsd:integer ] is quite similar to the 
> literal "42"^^xsd:integer, but it is not that literal. Requiring RDF systems 
> to recognized that literal whenever it is described like that would be a big 
> hurdle for implementers.
> >>
> >>> As far as I can see, the reason to use two graphs is to hide the triples 
> generated by embedded triples from SPARQL.
> >>>
> >>> So there are two hiding operations.  Are two really needed?
> >>
> >> You are right, in a way they are two hiding operations. But they play 
> different roles.
> >>
> >> * using 2 graphs instead of one avoids embedded triples to entail their 
> "expanded" representation (i.e. the 7 triples generated in G2 for that 
> embedded triple) -- that way, SPARQL-star engines (but also RDF-star systems 
> supporting simple entailment) do not need to construct the "expanded" 
> representation for every embedded triple they contain.
> >>
> >> * using a hidden IRI avoids the "expanded" representation to entail the 
> embedded triple (more precisely, the complete expanded representation DOES 
> entail the embedded triple, but it can not be serialized) -- that way, 
> Turtle-star parsers are not required to analyze all "reification" triples to 
> re-construct "expanded".
> >>
> >> So to sum it up, the first hiding operation is for the sake of 
> SPARQL-star, the second one is for the sake of Turtle-star.
> >>
> >>   pa
> >>
> >>>
> >>>
> >>> peter
> >>>
> >>>
> >>>
> >>> On 3/5/21 1:00 PM, Pierre-Antoine Champin wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I just pushed a pull-request adapting the semantics:
> >>>>
> >>>> https://github.com/w3c/rdf-star/pull/127
> >>>>
> >>>> I believe it has some advantages over the current version:
> >>>>
> >>>>   * it does not rely anymore on "hidden" predicates (see issue #101
> >>>> <https://github.com/w3c/rdf-star/issues/101>)
> >>>>   * it does not have the "merging" issue warned about in §6.3.1
> >>>> 
> <https://w3c.github.io/rdf-star/cg-spec/2021-02-18.html#combining-rdf-star-graphs>
> >>>>   * I think that it allows us to align SPARQL query semantics with simple
> >>>>     entailment (as newly defined)
> >>>>   * I think that it allows the Interpolation Lemma
> >>>> <https://www.w3.org/TR/rdf11-mt/#dfn-interpolation> to extend to RDF-star
> >>>>
> >>>> (I didn't formally prove the last two items, hence "I think"...)
> >>>>
> >>>> The trick is that we do not map anymore RDF-star graphs to a single, 
> semantically equivalent RDF graph.
> >>>> Instead, we map it to a pair of RDF graphs, which can be thought of as 
> a "lower and upper bound" of the RDF-star graph, in terms of entailment. The 
> semantics of the RDF-star graph is defined through the semantics of its 
> "bounds", reusing RDF semantics as is (as we currently do).
> >>>>
> >>>> In this new semantics, a strict RDF-star graph (i.e. one that contains 
> embedded triples) has no exactly equivalent RDF graph, so it still can not 
> be conveyed exactly using RDF syntaxes (but we do not rely anymore on hidden 
> predicates for that). However, either of the two "bounds" can be used to 
> approximate the RDF-star graph in legacy RDF. The "lower bound" will produce 
> correct but incomplete inferences. The "upper bound" will produce complete 
> inferences, with a few spurious (but generally harmless) ones.
> >>>>
> >>>> I am curious to get some feedback on this.
> >>>>
> >>>>
> >>>
> >>
> >
>
>
Received on Sunday, 21 March 2021 10:20:15 UTC