Re: Consolidating triple/edges from Thomas Lörtsch on 2023-12-14 (public-rdf-star-wg@w3.org from December 2023)

From: Thomas Lörtsch <tl@rat.io>
Date: Thu, 14 Dec 2023 17:46:17 +0100
To: Andy Seaborne <andy@apache.org>
Cc: RDF-star Working Group <public-rdf-star-wg@w3.org>
Message-Id: <25B38382-7409-46A1-B36D-78D633E74DFE@rat.io>
In principal I agree, although I have a few things to add and modify. Basically, if we decide to go the pragmatic route and standardize only an LPG-oriented subset of annotation functionality, we still have to be sure that we don’t block future extensions to a complete solution. That requires us to think many things through (like graphs, quotation). However, actually standardizing those other things is then not much extra work.

A few comments inline, a more coherent take at the end.

> On 12. Dec 2023, at 21:59, Andy Seaborne <andy@apache.org> wrote:
> 
> Here is an attempt to write out the details of what I think has been said recently.
> 
> It is addressing "publishing information about multi-edges".
> (Ideas here are from WG members - the mistakes are mime.)
> 
> 
> Multiple edges with the same label are handled as multiple occurrences - the predicate URI of the RDF triple is thought as a conceptual relationship - with multiple sets of annotations.
> 
> This preserves the uniqueness of triples in a graph, and allows independent collections of assertions about a relationship. Such collections of assertions do not get entangled on merge.

Just to emphasize that there is one more important differentiation to make: does the occurrence identify a specific triple or does it refer to the abstract statement. That has ramifications in use cases for unasserted assertions (multiple annotations, some of them meant to refer to an unasserted statement), updates (delete the statement with the annotation, or is another annotation still refering to it) etc. In more abstarct terms the question is if statements are understood as types already when authoring/storing or only later when querying/reasoning. It’s essentially a question of early vs late optimization, and RDF’s set semantics - while practical, fundamental to RDF and not to be impaired - is only a result of the former, nothing holy in itself. So working around it IS okay.

> ## Turtle
> 
> Add to Turtle a new statement (grammar rule 2):
> 
>   << occurrenceName | :s :p :o >> .
> 
> This names an occurrence of the triple s p o.
> 
> The triple is not asserted, keeping "assertion" and "occurrence" as orthogonal concepts even if they might commonly be used together.
> 
> occurrenceName is a URI or blank node, including [] (the ANON terminal rule 47 in Turtle - no triples inside the []).
> 
> It is better to have the name first to allow for split lines and modified annotation syntax below.
> 
> 
> ## N-Triples
> 
> In N-Triples, reflecting the RDF abstract data model, there is a property to relate occurrence to a triple term.
> 
>   :occurrenceName rdf:occurrenceOf << :s :p :o >> .
> 
> There are triples terms in the data model
> (RDF-Concepts - section 3.1 : editors draft [2]).
> 
> Renaming "quoted triple" as "triple term" would be better because it has less implication of the usage.
> 
> The NT syntax would be available in Turtle in the same way that rdf:first is available in Turtle - and with the same expectation that it would rarely be used.

except that in SPARQL there’s no way around it

> ## RDF Graph Merge
> 
> Graph merge happens as before - blank nodes need to be kept apart.
> 
> 
> ## Annotation
> 
> This gives the modified annotation syntax as per Thomas's email [1]:
> 
>>    :liz :spouse :dick { id:1 | :start 1964; :end 1974 |} .
>>    :liz :spouse :dick { id:2 | :start 1975; :end 1976 |} .
> 
> Slight syntax tweak: For SPARQL, reusing { has to be careful because { is a group start.
> 
> :liz :spouse :dick {| id:1 | :start 1964; :end 1974 |} .
> :liz :spouse :dick {| id:2 | :start 1975; :end 1976 |} .

ack

>> which would map to     id:1 rdfx:occurrenceOf << :liz :spouse :dick >> ;
>>         :start 1964; :end 1974 .
>>    id:2 rdfx:occurrenceOf << :liz :spouse :dick >> ;
>>         :start 1975; :end 1976 .
> 
> and asserting:
> 
>   :liz :spouse :dick .
> 
> 
> ## Named occurrences in term slots
> 
> << occurrenceName | :s :p :o >> could also be used in a subject or object slot with the occurenceName being the RDF term for subject or object (c.f. RDF collections and predicate object lists) for use with unasserted triples:
> 
> << [] | :s :p :o >>
>      :start 1964 ;
>      :end 1974 .


## Tokens vs Types

I’d like to completely turn the table on tokens vs types: a reference to the type has to explicitly address the type. A relation reciprocal to rdfx:occurrenecOf can achieve that e.g. 

   :T rdfx:typeOf << :s :p :o >>

OTOH, any reference to << :s :p :o >> is defined to implicitly references a token and may either provide a custom name or will be provided with a new blank node to name the reference. 


## Syntax

We should try to make the naming syntactically as uniform and predicatble as possible. The nested graph proposal uses a pair of square brackets [] prepending constructs to indicate the name. If a custom name is given it is entered into that pair. That violates the rules for [] in Turtle/TriG but seems to parse unambiguously.  Not providing any name syntactically and still assuming the presence of a blank node name is a bit more tricky.

    :liz :spouse :dick [id:1]{| :start 1964; :end 1974 |} .
    :liz :spouse :dick {| :start 1975; :end 1976 |} .       # _:id2

    [] << :s :p :o >> :start 1964 ; :end 1974 .

In any case: if it doesn’t parse without a prepended name, then prepend a [].


## Unasserted vs Asserted

Why not define a property that not only references a token, but also creates the triple, e.g.:

   :liz :spouse :dick [id:1]{| :start 1964; :end 1974 |} .

mapping to

    id:1 rdfx:assertionOf << :liz :spouse :dick >>
        :start 1964; :end 1974 .

instead of 

    id:1 rdfx:occurrenceOf << :liz :spouse :dick >>
        :start 1964; :end 1974 .
    :liz :spouse :dick .

That way we get identifiers for each triple occurrence together with the triple being asserted - direct identification, not earyl optimization. See above why that is important. 

All this unasserted business may seem a bit eccentric, but it’s the key to any sort of configurable semantics like quotation etc. It therefore has huge potential - if done right.


## SPARQL sugar

You compare the occurence-based shortcut relation to syntactic sugar for RDF lists, which is fine, except that querying those lists is a hardship. Same for RDF/XML’s syntactic support for RDF standard reification. Any kind of RDF syntactic sugar also needs proper support in SPARQL to be effective in practice.


## Triple terms vs Graph terms

Just for completeness: all for this can easily be expanded to graph terms. The syntax

    []{ :s :p :o. :u :v :w }

is explored in the nested graph proposal. 


## Graph Terms vs Named Graphs

I like Adrians example [0] of a complicated named graph based application and I’m taking that serious. However it should also be clear that triple/graph terms in the end are always stored in a way very similar to named graphs. There is just no other way in a quad based system. Triple/graph terms can be represented as named graphs, named graphs can be represented as graph terms. It’s a practical question of how to encode belonging/membership: syntactically as nested graphs, via a new term type as in RDF-star that transforms a triple into a term at the surface (but NOT in the underlying storage layer, for obvious performance reasons), via explicit binding relations as Niklas proposes [1] (and as Dydra implements nested graphs), etc. The main question is how to ensure that those binding relations don’t get lost in the process, but that IMHO is true for any solution. Nested graphs can be serialized to graph terms, which are just an extension of triple terms. That requires an additional en/de-coding step to fit them into an environment that reserves named graphs to its own purposes. That extra step is the price that those applications have to pay for being so particular about their use of named graphs. That’s only fair, and probably still economical for them.


## Term types vs Datatypes

The most fundamental grievance with RDF-star is the introduction of a new term type when a new datatype of type RDF/TTL would suffice. All I proposed above is readily imlpementable in the nested graph proposal, which does map to TriG and regular N-quads and such a datatype (and even Turtle and N-triples, but that’s another discussion).


Best,
Thomas



[0] https://lists.w3.org/Archives/Public/public-rdf-star-wg/2023Dec/0019.html
[1] https://lists.w3.org/Archives/Public/public-rdf-star-wg/2023Nov/0032.html


>    Andy
> 
> [1]
> https://lists.w3.org/Archives/Public/public-rdf-star-wg/2023Dec/0024.html
> 
> [2] https://w3c.github.io/rdf-concepts/spec/#section-triples
>    (as of 2023-12-10)
>
Received on Thursday, 14 December 2023 16:46:29 UTC