Re: Apache Jena support for RDF*

On 09/08/2020 09:14, Olaf Hartig wrote:
> Hi Andy,
> 
> Thanks a lot for these implementation notes!!
> 
> Some remarks and questions inline...
> 
> On onsdag 5 augusti 2020 kl. 15:15:07 CEST Andy Seaborne wrote:
>> [...]
>> Using BIND for FIND blocks this because "BIND(<<>> AS ?T)" is ambiguous
>> in meaning.
> 
> This very expression is not possible by the SPARQL* grammar as defined in
> Section 5.1 of http://arxiv.org/pdf/1406.3399

It is use <<>> to say "hereby triple term". The contents :s :p :o are 
omotted as not important.

This is used in several places in the email.

> 
> More specifically, the corresponding grammar rules are:
> 
> Bind ::= ’BIND’ ’(’ ExpressionOrEmbTP ’AS’ Var ’)’
> 
> ExpressionOrEmbTP ::= Expression | EmbTP
> 
> EmbTP ::= ’<<’ VarOrBlankNodeOrIriOrLitOrEmbTP Verb
> VarOrBlankNodeOrIriOrLitOrEmbTP ’>>’

Quite : EmbTP can't be in an expression ... except it can by evaluation

BIND(<<:s :p :o>> AS ?T)
FILTER(?T = <<:s :p :o>>)

which seems strange.

Also - you can return triple-terms from functions.

> 
> In other words, you must have a triple pattern in between the '<<' and the
> '>>'.
> Or are you saying that '<<>>' can also be parsed as an Expression?

In Jena - yes.

>   
>> [...]
>> Writing a grammar that distinguishes "BIND(<<>> AS ?T)" means it can't
>> be plain assignment. If <<>> is also to be allowed in expressions, the
>> grammar becomes complicated (several extra productions) at this point if
>> we stick the simple requirements of SPARQL (LL(1)) or several steps of
>> lookahead which for some parser generators is a burden (not for ARQ
>> which uses JavaCC).
>>
>> A different keyword removes all these problems.
>>
>> The keywords MBIND (M=multiple) or TBIND were also considered.
>> TRIPLETERM is a bit too long!
> 
> I am not sure I understand the exact problem you want to highlight here. Is
> the problem an issue with the SPARQL* grammar or is it an issue that the BIND
> clause in SPARQL* becomes multivalued (can result in multiple solution
> mappings) or both?

Both.

Expressions can't have <<:s :p :o>> in them which breaks symmetry. You 
can't write <<?s :p :o>> with ?s bound by an earlier part of the query. 
Seems odd to me.

This particular BIND is not a function (it is not purely defined by its 
arguments because the matching looks in the state (the graph) as well as 
being multivalued).  That has an effect on optimization and query rewriting.

> 
>> ----
>>
>> The use case for separate annotations means that parsing is SA.
>>
>> <<:s :p :o>> :q 123 .
>>
>> is one triple.
>>
>> This flows in N-triples because "one line - one triple" is natural
>> there. "wc -l" works on real world data and database dumps are more
>> portable.
>>
>> It also means that DELETE does not need special handling.
>> [...]
>> Looking up termified triples all the time seems expensive, at least
>> without some machinery to know when a look up isn't necessary.
> 
> Is it fair to summarize these remarks as: an efficient implementation of SA
> mode is more straightforward than an efficient implementation of PG mode?

The decision was that - for now - not to change the triple/quad tables, 
only introduce a new RDF term for RDF* triple terms. The latter is 
necessary anyway in either mode to support result formats.

If there is demand and a stable target, then that change can be done. 
The choice was to incremental add RDF* and see what up take there is.

Changing the triple/quad tables has a bigger impact, including on users 
not using RDF*. The current support means Jena users do not have to 
reload their data. (And because users aren't always the system 
operators, that's a big deal.)

Otherwise with an on-disk change of format, data has to 
migrated/reloaded. This is a big step and there is no reason to ask user 
to do that if they are not using RDF*.

(or flags for storage variants which is complexity for the database 
operation)

> Speaking of these modes, the document page you mentioned earlier does not
> provide any indication from which it would be possible to infer which mode
> your current implementation supports. Which mode is it?

Internally SA.
   Implementation in storage requires adding triple terms, and no more 
and it is not a table format change.

No cascading delete.

If the parser/INSERT generates the implied triple, it is PG without 
delete-cascading.

Nested triple-terms in subject and object position.

Annotations scoped to dataset, not graph.

 HTH
 Andy

> 
> Thanks,
> Olaf
> 
> 
>> ---
>>
>> These are decisions that seemed natural at the time - I'd expect Jena
>> users at the moment to care more about compatibility across implementations.
>>
>>       Andy
>>
>> On 04/08/2020 11:40, Andy Seaborne wrote:
>>> Jena version 3.16.0 completes the supports for RDF* and SPARQL*.
>>>
>>> This is a "deep integration" - it is available by default in various
>>> syntaxes and in Fuseki. The application does not need to enable it.
>>>
>>> It is supported in:
>>>
>>> text/turtle
>>> application/n-triples
>>> text/trig
>>> application/n-quads
>>>
>>> and for storage in-memory, and persistently in TDB (both TDB1 and TDB2).
>>>
>>> For SPARQL results, it is available in formats
>>>
>>>     JSON, XML, TSV, and RDF Thrift (binary), text.
>>>   
>>>   
>>>       https://jena.apache.org/documentation/rdfstar/
>>>   
>>>       Andy
> 
> 

Received on Sunday, 9 August 2020 17:34:19 UTC