Re: Consolidating triple/edges -- named occurrence version from Thomas Lörtsch on 2024-01-15 (public-rdf-star-wg@w3.org from January 2024)

From: Thomas Lörtsch <tl@rat.io>
Date: Mon, 15 Jan 2024 12:39:20 +0100
To: Pierre-Antoine Champin <pierre-antoine@w3.org>
Cc: RDF-star Working Group <public-rdf-star-wg@w3.org>
Message-Id: <F5B3FB71-4DB4-418E-96AD-C32AA9A9FDB3@rat.io>
> On 15. Jan 2024, at 11:18, Pierre-Antoine Champin <pierre-antoine@w3.org> wrote:
> 
> 
> On 12/01/2024 19:00, Thomas Lörtsch wrote:
>> 
>> 
>>> On 12. Jan 2024, at 18:35, Pierre-Antoine Champin <pierre-antoine@w3.org> wrote:
>>> 
>>> 
>>> On 12/01/2024 11:21, Thomas Lörtsch wrote:
>>> 
>>>> 
>>>> 
>>>>> On 12. Jan 2024, at 08:29, Pierre-Antoine Champin <pierre-antoine@w3.org> wrote:
>>>>> 
>>>>> Thomas,
>>>>> 
>>>>> My comments inline
>>>>> 
>>>>> On 11/01/2024 15:40, Thomas Lörtsch wrote:
>>>>> 
>>>>> 
>>>>>> [ No, I’m not *this* productive today. I’m also finishing and sending out some drafts from the last 14 days. This is a reply to Andy Seaborne, 1. Jan 2024, at 17:52 (see below for the full citation), and I’m sorry that I didn’t send it earlier. It complements my other mails from today in that it outlines a strategy for standardization that preserevs backwards compatability and leaves room for future extensions to - tadaa - graphs. ]
>>>>>> 
>>>>>> 
>>>>>> Hi Andy,
>>>>>> 
>>>>>> you seem to have dropped the <<( :s :p :o )>> type syntax and I think that’s a good idea, because IMHO that syntax is just too involved. However, you haven’t dropped the occurrence syntax for triple terms, and that’s causing lots of troubles down the line, starting with quad terms in the n-triples syntax.
>>>>>> The point of this mail is to argue for a compact syntax capturing the mainstream use cases and to leave the technical decisions to implementors (but out of the core specifications).
>>>>>> 
>>>>>> I would argue to concentrate on the predominant use case, which is annotating triple occurrences via a concise annotation shorthand syntax:
>>>>>> 
>>>>>> :s :p :o {| :a :b |} # here implicitly named with a new blank node
>>>>>> 
>>>>>> Make this part of RDF 1.2, and nothing else (especially not triple terms).
>>>>>> 
>>>>>> 
>>>>> just to be on the same page, I'm assuming that you mean
>>>>> "Make this part of Turtle 1.2" (i.e. we are talking about the *concrete syntax*).
>>>>> 
>>>>> 
>>>> TriG and SPARQL as well. I’m no expert on JSON-LD, but it should have something similar.
>>>> 
>>> yes, the point being: this is at the concrete syntax level.
>>> 
>> Yes, it is and that’s the nice thing about it. No need to extend the RDF model and semantics to support the functionality covered by the annotation syntax, which covers a vast majority of use cases
>> 
>>>>>> Provide alternative ways to expand this syntax, e.g.:
>>>>>> 
>>>>>> 
>>>>> still to be on the same page, by "expend this syntax", I assume you mean "interpret this concrete syntax into the abstract syntax" ?
>>>>> 
>>>>> 
>>>> I think that captures it, yes.
>>>> 
>>>> 
>>>> 
>>>>> If that is the case, could you please stick N-Triples to express the abstract syntax, so as to avoid ambiguity?
>>>>> 
>>>>> 
>>>> That’s case A) below, RDF standard reification. The example is given in Turtle, but the mapping to N-Triples is very straightforward.
>>>> 
>>> Straightforward can be misleading. Also, the fact that you are using Turtle makes it unclear whether you are focusing on the concrete syntax (the exact Turtle string) or the abstract syntax (the triples behind the string). Using N-Triples, there is less ambiguity about that.
>>> 
>> It is so very straightforward that I’m just too lazy to do it. Anybody in this WG should be able to see how this maps to N-Triples.
> Let me rephrase my argument above, as you are answering besides the point:
> I'm not asking you to write N-Triples to make my life easier in seeing the triples, I can indeed do that with Turtle.
> I'm asking you to write N-Triples to make it clear that you are focusing on the abstract syntax, not the concrete syntax.
> Such clarity would have spared both of us two rounds of email, and serve our respective laziness better :-) 
> Note also that, for toy examples, the N-Triples does not even need to use "real" IRIs. For example
>   _:b <rdf:subject> <ex:a>.
>   _:b <rdf:predicate> <ex:b>.
>   _:b <rdf:object> <ex:c>.
> is "N-Triple-ish" enough that everyone understand that you are talking about the triples, not about how they are written down...

But Turtle is easier to read, and a passing look at what I provided will reveal that there are no two ways to map it to N-Triples.

>>>>> Finally, since B and C are marked as optional, I assume that A is considered mandatory.
>>>>> 
>>>>> 
>>>> I wrote NORMATVE, but MANDATORY might be the more fitting term.
>>>> 
>>> Sorry, I missed the "normative". "normative" is in fact more precise in the case of a specification.
>>> 
>> Okay, thanks.
>> 
>> 
>>>>> Does it mean that if I want to implement C, I actually need to expand to A + C ?
>>>>> 
>>>>> 
>>>> I don’t think, or rather hope, that that’s necessary, as well for B. Let me phrase it differently: 
>>>> 
>>>> - A) would be the baseline: every system would be required to be able to handle the annotation syntax at least by mapping it to RDF standard reification.
>>>> 
>>>> - The semantics of RDF standard reification provides the definition of what the annotation syntax means. Its specification captures the intuitive meaning of the annotation syntax very well: a statement has been asserted and it’s referentially transparent representation is annotated.
>>>> 
>>>> - Systems that want to implement this as quoted terms or nested graphs are free to do so. Eventual divergences in semantics (you know: the subtle deviations that creep into almost every design) are their own responsibility and don’t compromise the baseline A.
>>>> 
>>> But if option A is normative and options B and C are not, then the spec should only mention option A!
>>> 
>> Of course. I didn’t attempt to write a draft spec here, but to explain where what should go: the syntactic feature and its mapping to standard reification into RDF proper, RDF-star as a possible alternative implementation into a semantic extension, NNG a separate approach, etc.
>>> In fact, from your last point above, it seems that you are conflating the abstract syntax with ways to implement it (you describe B and C as "ways to implement this"). The abstract syntax is not an implementation guideline. People are always free to implement it how they see fit as long as the resulting implementation behaves the way it should per the spec!
>>> 
>> The annotation syntax is not an abstract syntax (just like RDF standard reification isn’t), so I don’t quite follow your point.
> The primary role of the WG is to write specifications, not to tell people how to implement them.

I’m discussing a strategy here, not drafting a spec.

> So option B and C are mostly out-of-scope for the WG.

So RDF-star is out of scope of the WG? I think you going a bit too far with your argument. 

I wouldn’t be averse to the WG standardizing RDF-star as a semantic extension to RDF, honoring all the work that has already been poured into it. What is important for me is that RDF stamdard reification provides the baseline against which any approach - RDF-star, NNG, etc - has to show its compliance.

> It might be an interesting discussion to have eventually, but we should not mix up the two (and we should probably prioritize spec-discussion, as we are late on our schedule already). When you present 3 options, one of them being spec-related and two of them being implem-related, this is confusing (at least for me).

I’m proposing a strategy of how the pieces can be made fit together, and where to place which. 

It doesn’t place RDF-star into the core of RDF anymore, it makes it optional. Some people might be disappointed by that, some might be delighted, but most importantly: IMO even RDF-star profits from such a resolution.

I would have liked the WG to discuss the NNG proposal in more detail as sure it could use more input, scrutiny (and of course uptake ;-) If that is not the case, well, then that’s how it is. NNG will not go away that quickly however and we’ll see how things work out in the long run.

> That being said: I think what you propose in Option A makes a lot of sense to me. The more we try to prevent users to use triple terms directly, the more I think we are better off without them in the first place.
> But again, many people hate reification so much that we might get significant push back... :-/

But again, many people hate RDF-star triple terms. The strategy I propose leaves it to implementers in which way they want to support the annotation syntax. That should ease any hard feelings pretty well.

Remember that the RDF-star CG report defines the unstar-mapping. That doesn’t mean that implementations have to store triple terms as reification quads (or even septuples). RDF 1.2 could go the same way, just leaving it open which implementation a store choses. If it choses the RDF-star semantic extension, then the implementation will be triple terms (or quad terms, or whatever comes out of the current discussions).


Can we switch to discussing the pull request called miniCore that I created [0]? It provides a more self-contained version of the proposal and I wrote it down because people found the mail threads too hard to follow.

Best,
Thomas


[0] https://github.com/w3c/rdf-star-wg/pull/102/files?short_path=fc706e6#diff-fc706e6e2a3735265c66fc9233a42aaa5b230441ba81e30a8c26be3c582eba21



>>>> AFAICT we have never discussed to explicitly remove RDF standard reification from RDF 1.2, so the syntax will have to be supported anyway. Some implementations already seem to implement it differently from actual reification quads, as otherwise they could hardly support it with the performance they do. In that sense we are just bulding on established practice.
>>>> 
>>>> Thomas
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> pa
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> A | RDF 1.2 standard reification
>>>>>> 
>>>>>> :s :p :o .
>>>>>> [] a rdf:Statement ;
>>>>>> rdf:subject :s ;
>>>>>> rdf:predicate :p ;
>>>>>> rdf:object :o ;
>>>>>> :a :b .
>>>>>> 
>>>>>> The reification quad is well disliked, but it provides a common denominator since RDF 1999.
>>>>>> For stores that don’t plan to support lots of annotations or heavy loads of LPG-style data this still offers an economical path to full RDF 1.2 compatability (of course, some stores even support RDF standard reification *very* efficiently, despite its syntactic verbosity, so there’s no judgement involved at all).
>>>>>> For stores that implement RDF-star as named graphs it provides security that their implementation won’t be invalidated by some unforseen usage of triple terms.
>>>>>> 
>>>>>> According tio this proposal it would be NORMATIVE that a system claiming to support RDF 1.2 has to support the annotation shorthand syntax via support RDF standard reification.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> B | RDF-star semantic extension - OPTIONAL
>>>>>> 
>>>>>> :s :p :o .
>>>>>> [] rdf-star:occurrenceOf << :s :p :o >> ;
>>>>>> :a :b .
>>>>>> 
>>>>>> The triple term refers to the abstract type and all occurrences have to be explicitly created. This does away with any optimization at the N-Triples level, e.g. occurrence terms - avoiding many if not all of the problems you list - and still is pretty concise. It leaves room for further extensions towards explicitly asserted or unasserted statements etc.
>>>>>> Maybe named occurrences are the better solution than this extra triple, but I’d prefer to not be the judge on that. These are details of solutions to problems that the next proposal just doesn’t have in the first place.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> C | Nested Graphs - OPTIONAL
>>>>>> 
>>>>>> [] nng:asserts ":s :p :o."^^nng:ttl ; # which entails < :s :p :o . >
>>>>>> :a :b .
>>>>>> 
>>>>>> Of course we have our own syntactic sugar
>>>>>> 
>>>>>> []{ :s :p :o } :a :b .
>>>>>> 
>>>>>> but we also support the RDF-star shorthand syntax.
>>>>>> 
>>>>>> Obviously Nested Graphs provide the shortest expansion because they also entail the asserted triple, but that arrangement is still under discussion (see below). Less obviously they also are much easier to query than a combination of RDF named graphs and RDF-star triple terms (or graph terms, for that matter).
>>>>>> Note that Nested Graphs don’t require a semantic extension (at least that’s the current understanding) because they get by without a change to abstract model and semantics. Beyond the syntactic sugar and a new RDF datatype they mereley push boundaries and make some common implicit assumptions explicit and configurable.
>>>>>> 
>>>>>> 
>>>>>> Best,
>>>>>> Thomas
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On 1. Jan 2024, at 17:52, Andy Seaborne <andy@apache.org> wrote:
>>>>>>> 
>>>>>>> In the named occurrence proposal, can the blank node of a named occurrence RDF term be used on its own as the named occurrence?
>>>>>>> 
>>>>>>> << _:a | :s :p :o >> :starts 1989 .
>>>>>>> _:a :finishes 1990 .
>>>>>>> 
>>>>>>> 
>>>>>>> As the name can be a URI,
>>>>>>> 
>>>>>>> << <http://example/occ1> | :s :p :o >> :starts 1999 .
>>>>>>> <http://example/occ1> :finishes 2000 .
>>>>>>> 
>>>>>>> In SPARQL: does this match the examples above?
>>>>>>> 
>>>>>>> SELECT * {
>>>>>>> ?X :starts ?start .
>>>>>>> ?X :finishes ?finish .
>>>>>>> }
>>>>>>> 
>>>>>>> If yes,
>>>>>>> either
>>>>>>> _:a :finishes 1990 .
>>>>>>> is actually
>>>>>>> << _:a | :s :p :o >> :finishes 1990 .
>>>>>>> 
>>>>>>> or if
>>>>>>> _:a :starts 1989 .
>>>>>>> _:a :finishes 1990 .
>>>>>>> then how does the application find :s :p :o?
>>>>>>> 
>>>>>>> In the proposed semantics: [1]
>>>>>>> 
>>>>>>> [I+A](r) = IS(r) if r is a iri
>>>>>>> [I+A](r) = A(r) if r is a BlankNode
>>>>>>> [I+A](r) = [I+A](r.id) if r is a tripleOccurrence
>>>>>>> 
>>>>>>> so
>>>>>>> 
>>>>>>> [I+A](r) = A(r.id) if r is a tripleOccurrence
>>>>>>> and r.id is a blank node
>>>>>>> [I+A](r) = IS(r.id) if r is a tripleOccurrence
>>>>>>> and r.id is a URI.
>>>>>>> 
>>>>>>> This has an impact on implementations and APIs.
>>>>>>> 
>>>>>>> Take Apache CommonsRDF [2] as an example.
>>>>>>> 
>>>>>>> The accessor function on a graph is
>>>>>>> 
>>>>>>> Stream<? extends Triple>
>>>>>>> stream(BlankNodeOrIRI subject, IRI predicate, RDFTerm object)
>>>>>>> 
>>>>>>> where subject/predicate/object can be a constant or a wildcard.
>>>>>>> 
>>>>>>> So if the application is given <http://example/occ1>, how does it determine whether URI is named occurrence and if so, how does it find the triple subject/predicate/object?
>>>>>>> 
>>>>>>> Scanning all triples to find named occurrences and looking at the id of a named occurrence is expensive.
>>>>>>> 
>>>>>>> Expecting an addition function x -> triple just for occurrences is a big step.
>>>>>>> 
>>>>>>> In the triple-term version has rdf:occurrenceOf so there is a triple to maps the blank node / URI to the 3-tuple of s,p,o that had the effect of OT.
>>>>>>> 
>>>>>>> Andy
>>>>>>> 
>>>>>>> [1] https://github.com/w3c/rdf-star-wg/wiki/Semantics:-Andy's-proposal#semantics
>>>>>>> 
>>>>>>> [2] Apache CommonsRDF : https://commons.apache.org/proper/commons-rdf/
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> <OpenPGP_0x9D1EDAEEEF98D438.asc>
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> <OpenPGP_0x9D1EDAEEEF98D438.asc>
>>> 
>> 
>> 
> <OpenPGP_0x9D1EDAEEEF98D438.asc>
Received on Monday, 15 January 2024 11:39:35 UTC