Re: Consolidating triple/edges -- named occurrence version from Thomas Lörtsch on 2024-01-12 (public-rdf-star-wg@w3.org from January 2024)

From: Thomas Lörtsch <tl@rat.io>
Date: Fri, 12 Jan 2024 19:00:06 +0100
To: Pierre-Antoine Champin <pierre-antoine@w3.org>
Cc: RDF-star Working Group <public-rdf-star-wg@w3.org>
Message-Id: <13B0C449-62BA-4760-8E8D-A6057F680D24@rat.io>
> On 12. Jan 2024, at 18:35, Pierre-Antoine Champin <pierre-antoine@w3.org> wrote:
> 
> 
> On 12/01/2024 11:21, Thomas Lörtsch wrote:
>> 
>> 
>>> On 12. Jan 2024, at 08:29, Pierre-Antoine Champin <pierre-antoine@w3.org> wrote:
>>> 
>>> Thomas,
>>> 
>>> My comments inline
>>> 
>>> On 11/01/2024 15:40, Thomas Lörtsch wrote:
>>> 
>>>> [ No, I’m not *this* productive today. I’m also finishing and sending out some drafts from the last 14 days. This is a reply to Andy Seaborne, 1. Jan 2024, at 17:52 (see below for the full citation), and I’m sorry that I didn’t send it earlier. It complements my other mails from today in that it outlines a strategy for standardization that preserevs backwards compatability and leaves room for future extensions to - tadaa - graphs. ]
>>>> 
>>>> 
>>>> Hi Andy,
>>>> 
>>>> you seem to have dropped the <<( :s :p :o )>> type syntax and I think that’s a good idea, because IMHO that syntax is just too involved. However, you haven’t dropped the occurrence syntax for triple terms, and that’s causing lots of troubles down the line, starting with quad terms in the n-triples syntax.
>>>> The point of this mail is to argue for a compact syntax capturing the mainstream use cases and to leave the technical decisions to implementors (but out of the core specifications).
>>>> 
>>>> I would argue to concentrate on the predominant use case, which is annotating triple occurrences via a concise annotation shorthand syntax:
>>>> 
>>>> :s :p :o {| :a :b |} # here implicitly named with a new blank node
>>>> 
>>>> Make this part of RDF 1.2, and nothing else (especially not triple terms).
>>>> 
>>> just to be on the same page, I'm assuming that you mean
>>> "Make this part of Turtle 1.2" (i.e. we are talking about the *concrete syntax*).
>>> 
>> TriG and SPARQL as well. I’m no expert on JSON-LD, but it should have something similar.
> yes, the point being: this is at the concrete syntax level.

Yes, it is and that’s the nice thing about it. No need to extend the RDF model and semantics to support the functionality covered by the annotation syntax, which covers a vast majority of use cases.

>>>> Provide alternative ways to expand this syntax, e.g.:
>>>> 
>>> still to be on the same page, by "expend this syntax", I assume you mean "interpret this concrete syntax into the abstract syntax" ?
>>> 
>> I think that captures it, yes.
>> 
>> 
>>> If that is the case, could you please stick N-Triples to express the abstract syntax, so as to avoid ambiguity?
>>> 
>> That’s case A) below, RDF standard reification. The example is given in Turtle, but the mapping to N-Triples is very straightforward.
> Straightforward can be misleading. Also, the fact that you are using Turtle makes it unclear whether you are focusing on the concrete syntax (the exact Turtle string) or the abstract syntax (the triples behind the string). Using N-Triples, there is less ambiguity about that.

It is so very straightforward that I’m just too lazy to do it. Anybody in this WG should be able to see how this maps to N-Triples.

>>> Finally, since B and C are marked as optional, I assume that A is considered mandatory.
>>> 
>> I wrote NORMATVE, but MANDATORY might be the more fitting term.
> Sorry, I missed the "normative". "normative" is in fact more precise in the case of a specification.

Okay, thanks.

>>> Does it mean that if I want to implement C, I actually need to expand to A + C ?
>>> 
>> I don’t think, or rather hope, that that’s necessary, as well for B. Let me phrase it differently: 
>> 
>> - A) would be the baseline: every system would be required to be able to handle the annotation syntax at least by mapping it to RDF standard reification.
>> 
>> - The semantics of RDF standard reification provides the definition of what the annotation syntax means. Its specification captures the intuitive meaning of the annotation syntax very well: a statement has been asserted and it’s referentially transparent representation is annotated.
>> 
>> - Systems that want to implement this as quoted terms or nested graphs are free to do so. Eventual divergences in semantics (you know: the subtle deviations that creep into almost every design) are their own responsibility and don’t compromise the baseline A.
> But if option A is normative and options B and C are not, then the spec should only mention option A!

Of course. I didn’t attempt to write a draft spec here, but to explain where what should go: the syntactic feature and its mapping to standard reification into RDF proper, RDF-star as a possible alternative implementation into a semantic extension, NNG a separate approach, etc.

> In fact, from your last point above, it seems that you are conflating the abstract syntax with ways to implement it (you describe B and C as "ways to implement this"). The abstract syntax is not an implementation guideline. People are always free to implement it how they see fit as long as the resulting implementation behaves the way it should per the spec!

The annotation syntax is not an abstract syntax (just like RDF standard reification isn’t), so I don’t quite follow your point.

>> AFAICT we have never discussed to explicitly remove RDF standard reification from RDF 1.2, so the syntax will have to be supported anyway. Some implementations already seem to implement it differently from actual reification quads, as otherwise they could hardly support it with the performance they do. In that sense we are just bulding on established practice.
>> 
>> Thomas
>> 
>> 
>> 
>> 
>> 
>>> pa
>>> 
>>> 
>>>> 
>>>> 
>>>> A | RDF 1.2 standard reification
>>>> 
>>>> :s :p :o .
>>>> [] a rdf:Statement ;
>>>> rdf:subject :s ;
>>>> rdf:predicate :p ;
>>>> rdf:object :o ;
>>>> :a :b .
>>>> 
>>>> The reification quad is well disliked, but it provides a common denominator since RDF 1999.
>>>> For stores that don’t plan to support lots of annotations or heavy loads of LPG-style data this still offers an economical path to full RDF 1.2 compatability (of course, some stores even support RDF standard reification *very* efficiently, despite its syntactic verbosity, so there’s no judgement involved at all).
>>>> For stores that implement RDF-star as named graphs it provides security that their implementation won’t be invalidated by some unforseen usage of triple terms.
>>>> 
>>>> According tio this proposal it would be NORMATIVE that a system claiming to support RDF 1.2 has to support the annotation shorthand syntax via support RDF standard reification.
>>>> 
>>>> 
>>>> 
>>>> B | RDF-star semantic extension - OPTIONAL
>>>> 
>>>> :s :p :o .
>>>> [] rdf-star:occurrenceOf << :s :p :o >> ;
>>>> :a :b .
>>>> 
>>>> The triple term refers to the abstract type and all occurrences have to be explicitly created. This does away with any optimization at the N-Triples level, e.g. occurrence terms - avoiding many if not all of the problems you list - and still is pretty concise. It leaves room for further extensions towards explicitly asserted or unasserted statements etc.
>>>> Maybe named occurrences are the better solution than this extra triple, but I’d prefer to not be the judge on that. These are details of solutions to problems that the next proposal just doesn’t have in the first place.
>>>> 
>>>> 
>>>> 
>>>> C | Nested Graphs - OPTIONAL
>>>> 
>>>> [] nng:asserts ":s :p :o."^^nng:ttl ; # which entails < :s :p :o . >
>>>> :a :b .
>>>> 
>>>> Of course we have our own syntactic sugar
>>>> 
>>>> []{ :s :p :o } :a :b .
>>>> 
>>>> but we also support the RDF-star shorthand syntax.
>>>> 
>>>> Obviously Nested Graphs provide the shortest expansion because they also entail the asserted triple, but that arrangement is still under discussion (see below). Less obviously they also are much easier to query than a combination of RDF named graphs and RDF-star triple terms (or graph terms, for that matter).
>>>> Note that Nested Graphs don’t require a semantic extension (at least that’s the current understanding) because they get by without a change to abstract model and semantics. Beyond the syntactic sugar and a new RDF datatype they mereley push boundaries and make some common implicit assumptions explicit and configurable.
>>>> 
>>>> 
>>>> Best,
>>>> Thomas
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 1. Jan 2024, at 17:52, Andy Seaborne <andy@apache.org> wrote:
>>>>> 
>>>>> In the named occurrence proposal, can the blank node of a named occurrence RDF term be used on its own as the named occurrence?
>>>>> 
>>>>> << _:a | :s :p :o >> :starts 1989 .
>>>>> _:a :finishes 1990 .
>>>>> 
>>>>> 
>>>>> As the name can be a URI,
>>>>> 
>>>>> << <http://example/occ1> | :s :p :o >> :starts 1999 .
>>>>> <http://example/occ1> :finishes 2000 .
>>>>> 
>>>>> In SPARQL: does this match the examples above?
>>>>> 
>>>>> SELECT * {
>>>>> ?X :starts ?start .
>>>>> ?X :finishes ?finish .
>>>>> }
>>>>> 
>>>>> If yes,
>>>>> either
>>>>> _:a :finishes 1990 .
>>>>> is actually
>>>>> << _:a | :s :p :o >> :finishes 1990 .
>>>>> 
>>>>> or if
>>>>> _:a :starts 1989 .
>>>>> _:a :finishes 1990 .
>>>>> then how does the application find :s :p :o?
>>>>> 
>>>>> In the proposed semantics: [1]
>>>>> 
>>>>> [I+A](r) = IS(r) if r is a iri
>>>>> [I+A](r) = A(r) if r is a BlankNode
>>>>> [I+A](r) = [I+A](r.id) if r is a tripleOccurrence
>>>>> 
>>>>> so
>>>>> 
>>>>> [I+A](r) = A(r.id) if r is a tripleOccurrence
>>>>> and r.id is a blank node
>>>>> [I+A](r) = IS(r.id) if r is a tripleOccurrence
>>>>> and r.id is a URI.
>>>>> 
>>>>> This has an impact on implementations and APIs.
>>>>> 
>>>>> Take Apache CommonsRDF [2] as an example.
>>>>> 
>>>>> The accessor function on a graph is
>>>>> 
>>>>> Stream<? extends Triple>
>>>>> stream(BlankNodeOrIRI subject, IRI predicate, RDFTerm object)
>>>>> 
>>>>> where subject/predicate/object can be a constant or a wildcard.
>>>>> 
>>>>> So if the application is given <http://example/occ1>, how does it determine whether URI is named occurrence and if so, how does it find the triple subject/predicate/object?
>>>>> 
>>>>> Scanning all triples to find named occurrences and looking at the id of a named occurrence is expensive.
>>>>> 
>>>>> Expecting an addition function x -> triple just for occurrences is a big step.
>>>>> 
>>>>> In the triple-term version has rdf:occurrenceOf so there is a triple to maps the blank node / URI to the 3-tuple of s,p,o that had the effect of OT.
>>>>> 
>>>>> Andy
>>>>> 
>>>>> [1] https://github.com/w3c/rdf-star-wg/wiki/Semantics:-Andy's-proposal#semantics
>>>>> 
>>>>> [2] Apache CommonsRDF : https://commons.apache.org/proper/commons-rdf/
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> <OpenPGP_0x9D1EDAEEEF98D438.asc>
>>> 
>> 
>> 
> <OpenPGP_0x9D1EDAEEEF98D438.asc>
Received on Friday, 12 January 2024 18:00:19 UTC