Re: Apache Jena support for RDF* from Andy Seaborne on 2020-08-05 (public-rdf-star@w3.org from August 2020)

From: Andy Seaborne <andy@apache.org>
Date: Wed, 5 Aug 2020 15:15:07 +0100
To: public-rdf-star@w3.org
Message-ID: <ba141725-1521-9752-d4b1-6bcc34710437@apache.org>
Implementation notes:

ARQ supports <<>> in CONSTRUCT, VALUES, expressions and SPARQL Update.

----
There is one new production "TripleTerm" and then that is used in 
DataBlockValue (VALUES),  VarOrTerm (which covers BGPs, paths update 
templates and expressions).
----

<<>> is a new RDFTerm

Because in Jena RDFTerms are immutable, you can't create cycles.

----

There is one new operator in the algebra (TR in the paper) that is 
called "(find)" - it matches a <<>> pattern recursively, and assigns the 
top level match to a variable.

Because this is fundamentally different to BIND -- (find) is multivalued 
and not a function of its arguments -- the syntax calls this FIND. This 
leaves open the possibility of writing <<>> in SPARQL expressions.

Using BIND for FIND blocks this because "BIND(<<>> AS ?T)" is ambiguous 
in meaning.

Jena supports functions on triple terms so it's in expressions whether 
indirect via variables or directly writing.

e.g. accessors:

    afn:subject(<<:s :p :o>>) ==> :s

constructor:

   afn:triple(?s, ?p, ?o) ==> << ?s ?p ?o>> if ?s ?p ?o are bound.

which is what happens in CONSTRUCT.

Writing a grammar that distinguishes "BIND(<<>> AS ?T)" means it can't 
be plain assignment. If <<>> is also to be allowed in expressions, the 
grammar becomes complicated (several extra productions) at this point if 
we stick the simple requirements of SPARQL (LL(1)) or several steps of 
lookahead which for some parser generators is a burden (not for ARQ 
which uses JavaCC).

A different keyword removes all these problems.

The keywords MBIND (M=multiple) or TBIND were also considered.
TRIPLETERM is a bit too long!

----

The use case for separate annotations means that parsing is SA.

<<:s :p :o>> :q 123 .

is one triple.

This flows in N-triples because "one line - one triple" is natural 
there. "wc -l" works on real world data and database dumps are more 
portable.

It also means that DELETE does not need special handling.

DELETE DATA { :s :p :o. }

has a conditional side effect of

DELETE WHERE { << :s :p :o >> ?p ?o } ;
DELETE WHERE { ?s ?p << :s :p :o >> }

depending on the whole update operation. Combined with multiple requests 
in the same update, it effectively blocks streaming.

Looking up termified triples all the time seems expensive, at least 
without some machinery to know when a look up isn't necessary.

---

These are decisions that seemed natural at the time - I'd expect Jena 
users at the moment to care more about compatibility across implementations.

     Andy

On 04/08/2020 11:40, Andy Seaborne wrote:
> Jena version 3.16.0 completes the supports for RDF* and SPARQL*.
> 
> This is a "deep integration" - it is available by default in various 
> syntaxes and in Fuseki. The application does not need to enable it.
> 
> It is supported in:
> 
> text/turtle
> application/n-triples
> text/trig
> application/n-quads
> 
> and for storage in-memory, and persistently in TDB (both TDB1 and TDB2).
> 
> For SPARQL results, it is available in formats
>    JSON, XML, TSV, and RDF Thrift (binary), text.
> 
> 
>      https://jena.apache.org/documentation/rdfstar/
> 
>      Andy
Received on Wednesday, 5 August 2020 14:15:23 UTC