Can SA be extension of RDF+reification? [was Re: PG mode and SA mode] from Jerven Bolleman on 2019-09-25 (public-rdf-star@w3.org from September 2019)

From: Jerven Bolleman <jerven.bolleman@sib.swiss>
Date: Wed, 25 Sep 2019 10:59:27 +0200
To: public-rdf-star@w3.org
Message-ID: <aa78ab90-2e63-0811-8db5-8d2911b6c31b@sib.swiss>
Hi Olaf, All,

RDF* uses a triple t' itself instead of a name (id) of the triple.
I think this can be pure syntactic sugar.

The key part is that there needs to be a set of defined mappings from 
triple to name that can be generated with a simple function.

The hack that allows this is to introduce a new URN type, I propose as 
an exemplar of the idea

urn:triple:raw:%3Chttp%3A%2F%2Fpurl.uniprot.org%2Fcore%2FProtein%3E%20%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23type%3E%20%3Chttp%3A%2F%2Fpurl.uniprot.org%2Funiprot%2FP05067%3E

Assume a statement like <<http://purl.uniprot.org/uniprot/P05067> 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://purl.uniprot.org/core/Protein>>.

Would infer in existing reification turtle.
<http://purl.uniprot.org/uniprot/P05067>
  a
  <http://purl.uniprot.org/core/Protein> .
<urn:triple:raw:%3Chttp%3A%2F%2Fpurl.uniprot.org%2Fcore%2FProtein%3E%20%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23type%3E%20%3Chttp%3A%2F%2Fpurl.uniprot.org%2Funiprot%2FP05067%3E> 
a rdf:Statement
; rdf:subject <http://purl.uniprot.org/core/Protein>
; rdf:predicate rdf:type
; rdf:object <http://purl.uniprot.org/core/Protein> .

The key thing is that such a raw syntax can be made nice in existing 
RDF/XML, and of course would not be horrid in Turtle* as there is no 
need to use the syntax as is.

For uniprot.org we have started to use content derived identifiers for 
our reification [1]. And if there was a default technique we could use 
to avoid the current materialization overhead it would be great :)

In this way each triple is named by itself and this uglyness can be 
hidden behind an abstract machine.

Now Blank Nodes are of course a problem as they always are ;) The 
un-identifable nodes can not be addressed in such a way. I think that is 
ok. For those the mapping would be indeed <[] a up:Protein> would lead 
to [] rdf:Statement ; rdf:predicate a ; rdf:object up:Protein . and the 
identifier of the triple would be a blank node. The usual workarounds 
would apply.

I want to reaffirm that this is an ugly syntax that should be well 
hidden under the default RDF* beauty.

Benefit of allowing this is that it would allow existing RDF stores to 
very quickly adapt to SPARQL* without touching their storage layer. 
Making adoption of the change fast because no one needs to do a lot of 
work to have it "working" and then can spend a lot of time on making it 
fast.

For data providers like me who have used reification a lot, it is nice 
to because it allows translation of existing sparql queries that use 
reification patterns to be interpreted as SPARQL*. e.g. this means that
we can change to RDF* from day one and not wait until the last of our 
users has upgraded their RDF database.


PREFIX up:<http://purl.uniprot.org/core/>
SELECT ?annotationEvidence WHERE {
   ?p up:annotation ?a
   [] a rdf:Statement ;
      rdf:subject ?p
      rdf:predicate up:annotation ;
      rdf:object ?a ;
      up:attribution/up:evidence ?annotationEvidence .
}

Can then be mechanically transformed to.

PREFIX up:<http://purl.uniprot.org/core/>
SELECT ?annotationEvidence WHERE {
   <?p up:annotation ?a>
      up:attribution/up:evidence ?annotationEvidence .
}

Without changing the semantics of our datamodel or impacting existing users.

This way there is compatibility between RDF and RDF* without needing a 
flag day and convincing everyone to change at once.


Next question is how to deal with incomplete reification quads? 
including those for which there is no asserted triple. Pragmatically 
triplestores can deal with those in different ways. The first is to not 
allow them. The second is to store them and if they are present in the 
store execute a query as above.

PREFIX up:<http://purl.uniprot.org/core/>
SELECT ?annotationEvidence WHERE {
   {
     <?p up:annotation ?a>
       up:attribution/up:evidence ?annotationEvidence .
   } UNION {
     [] a rdf:Statement ;
      rdf:subject ?p
      rdf:predicate up:annotation ;
      rdf:object ?a ;
      up:attribution/up:evidence ?annotationEvidence .
   }
}
Considering the presence of incomplete and not-asserted reification 
triples are very rare in the wild RDF corpera. I think for commercial 
practicality most vendors will just go for the not supported operation.

Other case is to introduce for each incomplete reif quad a blank node 
containing triple.

e.g.
   [] rdf:object uniprotkb:P05067

leads to
   <[] [] uniprotkb:P05067> in the store.

For those of use still using RDF/XML we would only need an update to 
section 2.17 of the spec to allow us to be RDF* without needing to 
change our writers at all.


Regards and apologies for the ugly urlescaped syntax before many of you 
had your coffee in the morning,

Jerven

[1] 
https://sparql.uniprot.org/sparql?query=PREFIX+rdf%3a%3chttp%3a%2f%2fwww.w3.org%2f1999%2f02%2f22-rdf-syntax-ns%23%3e+%0d%0aPREFIX+up%3a%3chttp%3a%2f%2fpurl.uniprot.org%2fcore%2f%3e+%0d%0aSELECT+%3freif%0d%0aFROM+%3chttp%3a%2f%2fsparql.uniprot.org%2fcitationmapping%3e%0d%0aWHERE%0d%0a%7b%0d%0a++++%3freif+a+rdf%3aStatement+.%0d%0a++%09FILTER(strlen(str(%3freif))+%3c+120)%0d%0a%7d



On 9/25/19 9:27 AM, Olaf Hartig wrote:
> On Wed, 2019-09-25 at 00:00 -0500, Patrick J Hayes wrote:
>>> On Sep 20, 2019, at 3:56 AM, Olaf Hartig <olaf.hartig@liu.se>
>>> wrote:
>>> [...]
>>> In fact, in RDF* there is no need for such a naming convention
>>> because, when talking about a triple t'=(s,p,o) in some other
>>> triple t, the idea of RDF* is to directly use the triple t' itself
>>> instead of using a name for that triple.
>>>
>>> t = ( (s,p,o), p2, o2 )
>>
>> I understand, and agree. But this does mean that your often-repeated
>> claim to somehow reduce RDF* to RDF reification is not accurate. RDF*
>> is a genuine extension to RDF.
> 
> Indeed, it is an actual extension. So, you are right: Reducing RDF* to
> RDF reification requires the introduction of identifiers for triples,
> as a result of which the reification description is not semantically
> linked anymore to  the described triple (due to the limitation of RDF
> reification).
> 
> Thanks,
> Olaf
> 
>
Received on Wednesday, 25 September 2019 09:00:04 UTC