Re: A question about referential opacity (again) from Niklas Lindström on 2023-10-24 (public-rdf-star-wg@w3.org from October 2023)

From: Niklas Lindström <lindstream@gmail.com>
Date: Tue, 24 Oct 2023 22:07:12 +0200
To: Doerthe Arndt <doerthe.arndt@tu-dresden.de>
Cc: Thomas Lörtsch <tl@rat.io>, RDF-star WG <public-rdf-star-wg@w3.org>, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
Message-ID: <CADjV5jdWPG9vQo7ZX6_Fxujz8T54BRh=Uwp=UMqB0G4jynOM6g@mail.gmail.com>
Dear Dörthe,

On Tue, Oct 24, 2023 at 7:10 PM Doerthe Arndt
<doerthe.arndt@tu-dresden.de> wrote:
>
> Dear Niklas,
>
> >
> >
> > I assume that your worry is that for graph terms to work, you'd have
> > to match its signature (or arity)?
>
> It can be, depending what the graph means. It is interesting to see what you’d expect, especially since I have another expectation (and so far, nothing is fixed, so we are both right ;) ).

I see your perspective better now (*closed* graph terms in Notation
3), so I understand your expectations here.

> > I don't think that's an issue. If
> > this:
> >
> >    << dbr:Linköping ex:locatedIn dbr:Sweden >> ex:statedAt
> > "2023-10-23"^^xsd:date .
> >
> > Was replaced with, or equivalent to (ignoring that this N3 cannot work
> > in TriG without lookahead parsing dealing with ambiguity, due to
> > default graph blocks):
> >
> >    { dbr:Linköping ex:locatedIn dbr:Sweden } ex:statedAt
> > "2023-10-23"^^xsd:date.
> >
> > And that thus, this is also possible:
> >
> >    {
> >      dbr:Linköping a ex:City ;
> >        ex:locatedIn dbr:Sweden
> >    } ex:statedAt "2023-10-23"^^xsd:date.
> >
> > Then I'd assume a query like (again ignoring that this syntax probably
> > won't fly in SPARQL):
> >
> >    SELECT ?p ?o ?date {
> >      { dbr:Linköping ?p ?o } ex:statedAt ?date
> >    }
> >
> > Would yield:
> >
> >    | ex:locatedIn | dbr:Sweden | "2023-10-23"^^xsd:date |
> >
> > In fact, this:
> >
> >    SELECT ?p ?o ?date {
> >      { dbr:Linköping ?p ?o. ?s1 ?p1 ?o1 } ex:statedAt ?date
> >    }
> >
> > should match too, just binding ?s1, ?p1 and ?o1 to each of the two
> > triples in turn (so an unperformant query, with unused redundant
> > results).
>
> Mmm, so basically, you include my predicate log:includes implicitly to the query? Note that the question here is (and I think that was also one of the questions for the different TriG semantics): is the graph we state as a graph term open or closed? I would expect (but as all of us, I am biased), that if my graph has no name at all, that it is closed. So, if I state
>    {
>      dbr:Linköping a ex:City ;
>        ex:locatedIn dbr:Sweden
>    } ex:statedAt "2023-10-23"^^xsd:date.

Actually, I didn't, I thought of the syntax as no different from
"regular" SPARQL BGPs. I think Notation 3 is sufficiently different
from SPARQL (and TriG) here that, *if* graph terms (as types) were
supported (again I have back and forth on this), I think the syntax
should be somewhat different; e.g. using a leading marker, like %{ ...
}. This would also reasonably signal that it is closed. And I think
you are correct in that expectation.

> I am talking about the exact graph   { dbr:Linköping a ex:City ;  ex:locatedIn dbr:Sweden } and not of, for example, a graph containing these two triples (and maybe more). So, in my view the graph above does not yield
>
>
>    {
>      dbr:Linköping a ex:City .
>    } ex:statedAt "2023-10-23"^^xsd:date.

Agreed, these two closed terms don't match.

> Note, that the predicate here is confusing and that I can find predicates where this would make no sense like:
>
> {:cat :is :alive, :dead } a :inconsitency.
>
> Should not yield
>
> {:cat :is :alive } a :inconsitency.

Also agreed. It does surface a thought I've had a lot lately: Notation
3 is sufficiently different from "just" RDF, in that it doesn't really
play under the same rules, right? Is e.g. entailment expected to play
a role when parsing N3 for instance? I'm still under the assumption
that N3 is for implementing rules and inference. (E.g. RDFS and OWL
are often implemented using N3 rules?) I think these differences are
important to recognize in order to sort out our expectations and
assumptions (including, I think, on opacity).

> So, in a sense we are back to the point where we need to consider the intended use and I think examples can get far more complex with graphs (which is also why we want them in the first place).

Indeed use cases are paramount here. I recently added a few where
quoting multiple triples are useful:
https://github.com/w3c/rdf-ucr/issues/26

That said, I think the bulk of what we've collected make do with
quoting one arc. We should also contrast those with "coarse-grained"
provenance already using regular named graphs as well though. (Here is
one example from our union library catalogue where the data is
"embellished" with snippets from other records, wrapped as named
graphs: https://niklasl.github.io/ldtr/demo/?url=https%3A//libris.kb.se/0xbdc9nj2qbd6dd/data.trig&edit=true)

> We do not need to follow N3, but there, graphs are closed and you can state relations between them and have predicates which help you to put them into relation. The reason is that otherwise you could never talk about a concrete graph as they do not have names.

Makes total sense (in that context).

> I think my point here is just: we all have expectations on how the graphs should behave and most likely they differ.

Indeed. :) I think distinguishing BGPs from graph terms is the crucial
thing here, then I believe our expectations will converge. And the
features and properties they bring must be tested against our use
cases, of course.

>
> >
> > This is based on what I think James also answered, that for named
> > graphs, if you have:
> >
> >    _:g1 {
> >      dbr:Linköping a ex:City ;
> >        ex:locatedIn dbr:Sweden
> >    }
> >    _:g1 ex:statedAt "2023-10-23"^^xsd:date ;
> >      ex:source wikipedia:Linköping .
> >
> > then this works:
> >
> >    SELECT ?p ?o ?date {
> >      graph ?g { dbr:Linköping ?p ?o. ?s1 ?p1 ?o1 }
> >      ?g ex:statedAt ?date .
> >    }
> >
> > This is because SPARQL BGPs in graph blocks match what's there;
> > they're not excluding graphs containing more triples. (I'm sure e.g.
> > Andy would phrase this much more correctly.)
>
> You mean
>
>    SELECT ?p ?o ?date {
>      graph ?g { dbr:Linköping ?p ?o.  }
>      ?g ex:statedAt ?date .
>    }
>
> Right?

I did! It makes no big difference though, as both would work; but mine
is redundant and less efficient.

> I think the example here is easier because we have the graph name (even though it is a blank node) and this determines somehow which graph we mean. Here, you assume, that
>
>    _:g1 {
>      dbr:Linköping a ex:City ;
>        ex:locatedIn dbr:Sweden
>    }
>
> „Means“ (in an informal sense) that there is the graph _:g1 and that this graph contains the triples
> dbr:Linköping a ex:City ;
>        ex:locatedIn dbr:Sweden.
>
> But _:g1 can contain more triples, it is open in that sense and if we want to talk about it, we use its label (the blank nodes _:g1).

Exactly.

> There are many possible points of view.

More than we can enumerate. :) What we need is to find the ones that
most effectively and efficiently cover the current and future use
cases. (Which is often surprisingly difficult; the result not seldom
looking quite different in hindsight. And while reaching agreements is
hard, doing it alone would be impossible.)

>
>
> >
> > This all said, I'm unconvinced of either triple or graph terms, as
> > they make it possible to talk about the abstract type itself, as
> > opposed to a reified occurrence thereof (which when talked about is a
> > token of the type).
>
> With this comment you just made clear for me what you mean by type vs. token in this context: you would like that in
>
> <<:a :b :c>> :p :o.
> <<:a :b :c>> :pp :oo.
>
> The two
>
> <<:a :b :c>> wohl refer to different instances? Right? If not, please correct me, because previously, I did not fully get that (always easier to know the own point of view and „being right“ than understanding someone else’s ;) ).  That would be important for your use case? I think that this can make things complicated, but before I complain (and construct evil examples), I need to fully understand. Would you want the << >>-notation to only be syntactic sugar for reification?

Your view of my perspective is spot on in principle here; that is
exactly the difference I mean. And I do have a preference for
reification with sugar on top (albeit I think it has drawbacks (triple
explosion being one), and named graphs have features that can amend
that).

But actually, I might not want that for this particular notation! I
think that notation "affords" uniqueness, since it looks so much like
IRIs, as we're "trained" to read Turtle et al like that.

What I do want is to not use that, especially not as subjects, as
there is little I could say (beyond logical rules) about it. I'd
rather use blank graphs, which to our "TriG intuition" are *tokens* of
that singleton edge (as in reified occurrences whom I can speak
about):

    [  :p :o ] { :a :b :c }
    [  :pp :oo ] { :a :b :c }

In this case, I'm still for my proposed "quotation dash" shorthand
(revisions could be made to make the annotation more palatable of
course, if the idea was to be accepted):

    :a :b -- :c {[  :p :o ] [  :pp :oo ]} .

meaning the above. (Both which mean "one occurence with :p :o, one
with :pp :oo, both claiming a: :b :c, neither of which are accepted as
asserted".)

> > But I'll write more about that in another reply.
>
> Looking forward to this.

Thank you! I'll try to complete it before tomorrow.

All the best,
Niklas


> Kind regards,
> Dörthe
>
> >
> > All the best,
> > Niklas
> >
> > On Mon, Oct 23, 2023 at 6:07 PM Doerthe Arndt
> > <doerthe.arndt@tu-dresden.de> wrote:
> >>
> >> Dear Thomas, all,
> >>
> >> In addition to what Peter said about RDF-star semantics and opacity, I’d like to clarify the community group semantics a little bit more: remember that we talk about the meaning of triple terms and not of the constituents (subject, predicate, object) of these terms. What was done in the unstar-mapping was a kind of reification with which we represented the triple with a blank node and then connected the iris of the constituents to this blank node (using the correct predicates) and also the lexical representation of these constituents. With this „trick“ we allowed that the quoted triple interpretation to be aware of the lexical representation of the triple and, if needed, to differentiate between triples having different interpretations, but that was not forced and as Peter also mentioned, the concrete interpretation was left open.
> >>
> >> For the working group semantics several possibilities have been discussed and they all rely on an interpretation function for the triple term (for example IT in Enrico’s case). This function maps to a resource (and it can do more, but does not need to). The interpretation function for the triple term can be applied on triples from the domain of discourse (then we can indeed combine it with IS or some alternative IS’), but it would for example also be possible to apply the IT function directly on the graphical representation of the triple (of course we need to be careful with blank nodes here). My point is just: please try to see the triple term  as a whole also as a resource to better understand the opacity.
> >>
> >> To the rest of the discussion and the added complexity: apart from all the theoretical aspects we discuss here (and where I agree that graphs are more complex than triples), please also note that we would have to decide howto deal with quoted graph terms in practice. In SPARQL queries, it is relatively easy to search for a triple term having dbr:Linköping as subject, like:
> >>
> >> Select ?p ?o ?date
> >> {
> >>  << dbr:Linköping ?p ?o>> ex:statedAt ?date
> >> }
> >>
> >> But to make a similar query for graphs, we either need to know the exact structure of the graph (that is: how many triples does it contain?) or we need to come up with extra Filter functions for SPARQL.
> >> If we have
> >>
> >>  { dbr:Linköping a ex:City; ex:locatedIn dbr:Sweden}  ex:statedAt „23.10.2023“^^xsd:date.
> >>
> >> A query
> >>
> >> Select ?p ?o ?date
> >> {
> >>  {dbr:Linköping ?p ?o. ?s1 ?p1 ?o1} ex:statedAt ?date
> >> }
> >>
> >> Would fire, but
> >>
> >> Select ?p ?o ?date
> >> {
> >>  {dbr:Linköping ?p ?o. ?s1 ?p1 ?o1. ?s2 ?p2 ?o3} ex:statedAt ?date
> >> }
> >>
> >> would not. I am sure we can solve this problem together, but this adds complexity since we need to have a discussion on how we would like to solve it.
> >>
> >> Side note: in N3 we would have a predicate log:includes for that and while it makes this case easier, it also adds complexity simply because your graph terms can contain blank nodes and you are back to a problem of simple entailment… (and I will not go further unless you ask :) )- In N3 you would do something like (I try to make it „SPARQL-style“ but I am not sure whether or not this makes it clear, so, feel free to ask):
> >>
> >> Select ?p ?o ?date
> >> {
> >>  ?graph  ex:statedAt ?date.
> >>  ?graph log:includes {dbr:Linköping ?p ?o. }.
> >> }
> >>
> >>
> >> The log:includes is some kind of function which can give you elements of your graph.
> >>
> >> I just added this here as one example to illustrate that Peter is right here: things get more complex if we have graph terms. I am sure that we can solve that together and I would like to do that with all of you, but at the same time I am worried that it will take too long…
> >>
> >> Kind regards,
> >> Dörthe
> >>
Received on Tuesday, 24 October 2023 20:07:46 UTC