Re: Annotation syntax [was: SPARQL* test suite]

Dear all,

I am intrigued by the idea to add a second option to the Turtle* syntax such 
that PG mode and SA mode can be explicitly distinguished from one another. In 
fact, now I wonder whether such a distinction can even be built into the RDF* 
data model (it can, see below).

The only issue with this new "PG mode option" for Turtle* is that it works 
only for annotations that have the annotated triple in the subject position. 
Take, for instance, Holger's example:

> ... instead of
> 
>      :bob :age 23 .
>      <<:bob :age 23>> :certainty 0.9 .
> 
> we can simply (alternatively) write
> 
>     :bob :age 23 {| :certainty 0.9 |} .

That works. However, the following Turtle* expression (assuming SA mode) 
cannot be written by using the proposed alternative syntax option.

     :bob :age 23 .
     :alice :disbelieves <<:bob :age 23>> .

Perhaps this limitation is not an issue. The notion of edge properties in 
Property Graphs has the same limitation after all. What do you think?

Ignoring this potential issue, I thought a bit about my question from above: 
Can such an explicit distinction between SA mode and PG mode be built into the 
RDF* data model itself?

Here is an idea: Take the notion of an RDF* graph as defined in my papers 
(i.e., a set of nested triples), but now interpreted under the SA mode 
assumption (i.e., a triple that appears inside another triple is not 
considered to be asserted, unless it also contained directly in the RDF* 
graph). Now, to integrate PG mode explicitly, we may define the notion of an 
"annotated RDF* graph" as a pair (G,a) where G is an RDF* graph and a is a 
partial function that maps some (potentially nested) triples in G to finite, 
nonempty sets of pairs (p,o) with p an IRI and o an RDF term (IRI, blank node, 
or literal). For instance, the aforementioned alternative Turtle* expression 
that Holger proposes can be captured by an annotated RDF* graph H=(G,a) such 
that

G = { (:bob,:age,23) }
dom(a) = { (:bob,:age,23) }
a( (:bob,:age,23) ) = { (:certainty,0.9) } .

Clearly, this is only meant to be an abstract syntax for formalizing the 
(extended) RDF* data model.

A similar extension can then be defined for the abstract syntax of SPARQL. 
Here, the notion of a BGP* can be extended into an "annotated BGP*" that is a 
pair (B,a) where B is a BGP and a is a partial function that maps some triple* 
patterns in B to finite, nonempty sets of pairs (p,o) with p an IRI or a 
variable and o an RDF term or a variable. As an example, the annotated BGP* 
(B,a) with

G = { (?x,:age,23) }
dom(a) = { (?x,:age,23) }
a( (?x,:age,23) ) = { (:certainty,?y) } 

captures the following WHERE clause of the user-facing syntax of SPARQL* with 
Holger's proposed alternative writing for PG mode:

WHERE {
   ?x :age 23 {| :certainty ?y |} .
}

Four more observations about my idea to extend the abstract syntax of the RDF* 
data model as described above:

1/ Observe that the function a maps to sets of pairs (p,o) rather than to 
single pairs. This is necessary to be able to annotate an RDF* triple with 
multiple key-value pairs, as is possible with Holger's proposed extension of 
Turtle*:

:bob :age 23 {| :certainty 0.9 |} .
:bob :age 23 {| :source http://bob.name/index.html |} .

Now, we may even consider an option to shorten this extended Turtle* 
expression as follows:

:bob :age 23 {| :certainty 0.9 ; 
                :source http://bob.name/index.html |} .

2/ It is not difficult to define mappings that map an annotated RDF* graph 
into an RDF* graph and vice versa (and, similarly, for annotated BGP*s). For 
instance, the annotated RDF* graph H in the example above can be mapped to the 
following RDF* graph:

G' = { (:bob,:age,23),  ((:bob,:age,23),:certainty,0.9) } .

Such mappings provide a formal foundation for SA mode systems to support 
Holger's proposed PG-focused extension of Turtle* and SPARQL*.

3/ Related to these mappings, it is now also possible to introduce a notion of 
a "redundancy-free annotated RDF* graph" (G,a) which satisfies the constraint 
that none of the annotations in the function a is also captured as a nested 
triple in G. For instance, the aforementioned annotated RDF* graph H is 
redundancy free, whereas H'=(G',a') with

G' = { (:bob,:age,23),  ((:bob,:age,23),:certainty,0.9) }
dom(a') = { (:bob,:age,23) }
a'( (:bob,:age,23) ) = { (:certainty,0.9) }

is not.

4/ Continuing on this line of thought, another possible restriction regarding 
the notion of annotated RDF* graphs can be the following: Given an annotated 
RDF* graph (G,a), if G does not contain any nested RDF* triples (i.e., G 
happens to be a pure RDF graph), then we call (G,a) an "annotated RDF graph." 
Now, this restriction may be the formal foundation for systems that support 
only the PG mode!

I am looking forward to hearing your thoughts about these ideas to integrate 
the explicit distinction between SA mode and PG mode into the abstract RDF* 
data model.

Thanks,
Olaf


On onsdag 2 september 2020 kl. 11:07:56 CEST Jeen Broekstra wrote:
> > Yes, I think so and apologies if I didn't communicate this clearly.
> > 
> > The point here is to *add* an alternative short cut so that instead of
> > 
> >     :bob :age 23 .
> >     
> >     <<:bob :age 23>> :certainty 0.9 .
> > 
> > we can simply (alternatively) write
> > 
> >    :bob :age 23 {| :certainty 0.9 |} .
> > 
> > This would serve as syntactic sugar for the (common) use case of both
> > asserting and annotating a triple, while still allowing free-standing
> > annotations. The short cut will not only make files significantly shorter,
> > but also make editing more user-friendly. The cost is for implementers
> > though, who would have to cover an additional case (both in parser and
> > serializer).
> 
> Thanks for clarifying  - in that case I think it's actually a very good
> idea. The main issue I see with supporting it is in the serialization side,
> which will be tricky to do for any streaming writer. However, we could
> support that kind of thing under the moniker of "pretty printing", which is
> already something that requires buffering anyway.
> 
> Jeen

Received on Wednesday, 2 September 2020 08:04:56 UTC