Re: Consolidating triple/edges -- named occurrence version

[ No, I’m not *this* productive today. I’m also finishing and sending out some drafts from the last 14 days. This is a reply to Andy Seaborne, 1. Jan 2024, at 17:52 (see below for the full citation), and I’m sorry that I didn’t send it earlier. It complements my other mails from today in that it outlines a strategy for standardization that preserevs backwards compatability and leaves room for future extensions to - tadaa - graphs. ]


Hi Andy,

you seem to have dropped the <<( :s :p :o )>> type syntax and I think that’s a good idea, because IMHO that syntax is just too involved. However, you haven’t dropped the occurrence syntax for triple terms, and that’s causing lots of troubles down the line, starting with quad terms in the n-triples syntax.
The point of this mail is to argue for a compact syntax capturing the mainstream use cases and to leave the technical decisions to implementors (but out of the core specifications).

I would argue to concentrate on the predominant use case, which is annotating triple occurrences via a concise annotation shorthand syntax:

:s :p :o {| :a :b |}       # here implicitly named with a new blank node

Make this part of RDF 1.2, and nothing else (especially not triple terms).
Provide alternative ways to expand this syntax, e.g.:



A | RDF 1.2 standard reification

:s :p :o .
[] a rdf:Statement ;
  rdf:subject :s ;
  rdf:predicate :p ;
  rdf:object :o ;
  :a :b .

The reification quad is well disliked, but it provides a common denominator since RDF 1999. 
For stores that don’t plan to support lots of annotations or heavy loads of LPG-style data this still offers an economical path to full RDF 1.2 compatability (of course, some stores even support RDF standard reification *very* efficiently, despite its syntactic verbosity, so there’s no judgement involved at all). 
For stores that implement RDF-star as named graphs it provides security that their implementation won’t be invalidated by some unforseen usage of triple terms.

According tio this proposal it would be NORMATIVE that a system claiming to support RDF 1.2 has to support the annotation shorthand syntax via support RDF standard reification.



B | RDF-star semantic extension - OPTIONAL

:s :p :o .
[] rdf-star:occurrenceOf << :s :p :o >> ;
  :a :b .

The triple term refers to the abstract type and all occurrences have to be explicitly created. This does away with any optimization at the N-Triples level, e.g. occurrence terms  - avoiding many if not all of the problems you list - and still is pretty concise. It leaves room for further extensions towards explicitly asserted or unasserted statements etc.
Maybe named occurrences are the better solution than this extra triple, but I’d prefer to not be the judge on that. These are details of solutions to problems that the next proposal just doesn’t have in the first place.



C | Nested Graphs - OPTIONAL

[] nng:asserts ":s :p :o."^^nng:ttl ;      # which entails < :s :p :o . >
  :a :b .

Of course we have our own syntactic sugar

[]{ :s :p :o } :a :b .

but we also support the RDF-star shorthand syntax.

Obviously Nested Graphs provide the shortest expansion because they also entail the asserted triple, but that arrangement is still under discussion (see below). Less obviously they also are much easier to query than a combination of RDF named graphs and RDF-star triple terms (or graph terms, for that matter). 
Note that Nested Graphs don’t require a semantic extension (at least that’s the current understanding) because they get by without a change to abstract model and semantics. Beyond the syntactic sugar and a new RDF datatype they mereley push boundaries and make some common implicit assumptions explicit and configurable. 


Best,
Thomas 




> On 1. Jan 2024, at 17:52, Andy Seaborne <andy@apache.org> wrote:
> 
> In the named occurrence proposal, can the blank node of a named occurrence RDF term be used on its own as the named occurrence?
> 
> << _:a | :s :p :o >> :starts 1989 .
> _:a :finishes 1990 .
> 
> 
> As the name can be a URI,
> 
> << <http://example/occ1> | :s :p :o >> :starts 1999 .
> <http://example/occ1> :finishes 2000 .
> 
> In SPARQL: does this match the examples above?
> 
> SELECT * {
>  ?X :starts ?start .
>  ?X :finishes ?finish .
> }
> 
> If yes,
> either
> _:a :finishes 1990 .
> is actually
> << _:a | :s :p :o >> :finishes 1990 .
> 
> or if
> _:a :starts 1989 .
> _:a :finishes 1990 .
> then how does the application find :s :p :o?
> 
> In the proposed semantics: [1]
> 
> [I+A](r) = IS(r) if r is a iri
> [I+A](r) = A(r) if r is a BlankNode
> [I+A](r) = [I+A](r.id) if r is a tripleOccurrence
> 
> so
> 
> [I+A](r) = A(r.id)   if r is a tripleOccurrence
>                    and r.id is a blank node
> [I+A](r) = IS(r.id)   if r is a tripleOccurrence
>                     and r.id is a URI.
> 
> This has an impact on implementations and APIs.
> 
> Take Apache CommonsRDF [2] as an example.
> 
> The accessor function on a graph is
> 
> Stream<? extends Triple>
>  stream(BlankNodeOrIRI subject, IRI predicate, RDFTerm object)
> 
> where subject/predicate/object can be a constant or a wildcard.
> 
> So if the application is given <http://example/occ1>, how does it determine whether URI is named occurrence and if so, how does it find the triple subject/predicate/object?
> 
> Scanning all triples to find named occurrences and looking at the id of a named occurrence is expensive.
> 
> Expecting an addition function x -> triple just for occurrences is a big step.
> 
> In the triple-term version has rdf:occurrenceOf so there is a triple to maps the blank node / URI to the 3-tuple of s,p,o that had the effect of OT.
> 
>   Andy
> 
> [1] https://github.com/w3c/rdf-star-wg/wiki/Semantics:-Andy's-proposal#semantics
> 
> [2] Apache CommonsRDF : https://commons.apache.org/proper/commons-rdf/
> 

Received on Thursday, 11 January 2024 14:40:46 UTC