W3C home > Mailing lists > Public > public-rdf-star@w3.org > October 2020

Yes it can and should ;) Re: Can RDFstar be defined as only syntactic sugar on top of RDF (Re: weakness of embedded triples)

From: Jerven Bolleman <jerven.bolleman@sib.swiss>
Date: Wed, 28 Oct 2020 12:29:14 +0100
To: public-rdf-star@w3.org
Message-ID: <1214e182-1cf7-dfe6-f61c-4321ccb8b648@sib.swiss>
Hi All,

Yes it can be defined as syntactic sugar, and IMO it should. Considering 
in the beginning that RDF* was defined in terms of RDF reification. And 
it can be implemented that way e.g. as POC done for rdflib.

Which gets is back to the reason what do we want to achieve with 
RDFstar? I think most of us want to have reification without the hassle 
of writing out a quad all the time. Plus having optimizations possible 
at the storage level/query engine.

A solution for the triples with all IRI and/or Literals is simply 
generate an implicit IRI for them.

For the triples with a blank node I think the simplest is to generate an
implicit new blank node for them. (Which would be skolemized in some 
form in any triple store anyway).

SPARQL and RDF syntax wise this would be simple. API wise this would
also be easy in python rdflib. We have two new subclasses of URIRef, and 
BNode basically.


class TripleUriRef(URIRef):

class TripleBnode(BNode):

in java rdf4j it would probably be something like

interface Triple{
}
interface TripleIRI extends IRI, Triple {
}
interface TripleBNode extends BNode, Triple {
}

Semantics stay the same, we only get a new syntax for reification. 
Question about PG|SA stays unanswered by doing this.

For me as an user, what benefits would a different semantics for RDFstar 
bring?

Going the syntactic sugar route we don't need a specification for 
RDFstar, just for TurtleStar, RDF/XMLstar and SPARQLstar etc.
Which we would need anyway.

The problem here becomes, lack of a WG means we don't have a good way
to determine consensus and actually record a decision.

Still mapping RDFstar in terms of RDF reification leaves open the issue 
of referential opacity. I think that this is a red herring, I think the 
superman problem is an issue with datamodelling, which should not 
complicate our entire tech stack. i.e the :superman owl:sameAs 
:clarkKent is a faulty assertion and that fault leads to the 
impossibility to correctly express what :louislane believes.

Regards,
Jerven

PS. Regarding the PG or SA mode I am a fan of going for PG, given the 
UniProt experience with RDF/XML rdf:ID  which is a PG syntax. rdf:ID 
being a PG syntax is important for our internal code being needed to 
generate and read our rdf/xml. Not having this kind of sugar in other 
syntaxes is why we are still preferably shipping rdf/xml for UniProt.

PSS. about 15% of UniProt triples are reifcation quads. Being able to 
get these out of the quad table would be nice. Especially the 
consequences for reducing the joins to use them and how badly these 
quads fit into current indexes.




On 10/28/20 9:57 AM, Pierre-Antoine Champin wrote:
> Holger,
> 
> (I did what should have been done a long time ago: rename this subthread 
> to something more relevant)
> 
> On 27/10/2020 23:31, Holger Knublauch wrote:
>> On 10/28/2020 1:53 AM, Pierre-Antoine Champin wrote:
>>> Holger,
>>>
>>> Now I'm confused. This thread (which should have been renamed a long
>>> time ago) is, in my understanding, about Martynas' question raised here
>>>
>>> <https://www.w3.org/mid/CAE35Vmy3vbThwHnKjbhMQuwKkH0BhNoxr_Gp15Ri5LfOdedsSA@mail.gmail.com>
>>>
>>>> Does RDF* need new semantics at all?
>>> While I believe the answer is "yes", I concede that answering "no" to
>>> that question would be convenient, because it would mean that existing
>>> implementations of RDF could handle RDF* at the syntactical level only,
>>> i.e. parse Turtle* and store it standard RDF triples.
>>>
>>> In your examples below, however, you propose to extend existing
>>> implementations -- which defeats the purpose of fitting RDF* into
>>> standard RDF semantics...
>>
>> The current RDF* draft requires introducing a 4th term type "RDF* 
>> triples":
>>
>> > IRIs <https://www.w3.org/TR/rdf11-concepts/#dfn-iri>,literals 
>> <https://www.w3.org/TR/rdf11-concepts/#dfn-literal>,blank nodes 
>> <https://www.w3.org/TR/rdf11-concepts/#dfn-blank-node>andRDF* triples 
>> <https://w3c.github.io/rdf-star/#dfn-triple>are collectively known 
>> asRDF* terms.
>>
> Correct.
> 
> In my understanding, introducing a new subclass of Node (TripleNode) was 
> the implementation counterpart of this extension of the abstract syntax, 
> but it seems that I was misunderstanding.
> 
>> The approaches based on (long) URIs avoid this and therefore are 
>> likely much less concerning w.r.t. existing implementations. A 
>> syntactic mapping means that existing APIs can represent these triples 
>> as normal URI nodes. From our own experience moving to this design was 
>> not disruptive (although some users have raised concerns about 
>> exposing the ugly long URIs in unexpected places such as exporting 
>> them to plain Turtle).
>>
>> Having said this, *some* implementations may represent these triple 
>> nodes differently, e.g. using the internal data structure I outlined 
>> below. This avoids ever storing these long URIs, makes it arguably 
>> easier to index and search over them, and probably keeps the issues 
>> with changing bnode identifiers at bay. But this is an implementation 
>> detail to me.
>>
> Agreed, but conversely, implementing the current draft using long IRIs 
> can be considered an implementation detail...
> 
> I think we also agree that the aim of the spec is to be as clear and 
> simple as possible, and that implementations may depart use their own 
> different internal models (for various reasons: backward compatibility, 
> optimization...) as long as they behave according the the spec.
> 
>> The additional semantics of interpreting these special URI nodes 
>> differently would be local to the RDF* specs and would not require 
>> adaptations to existing specs.
>>
> Again, agreed: this would be much less disruptive to the entire 
> ecosystem –and much less work for us in writing this document ;-).
> 
> But again I think that blank nodes in embedded triples make this 
> approach very hard. Defining the correct behaviour of "long IRI" when 
> they actually represent embedded blank nodes, if even possible, would be 
> extremely cumbersome (as opposed to the clear and simple way that the 
> spec should aim for). That being said, any PR proving me wrong is welcome.
> 
>> Holger
>>
Received on Wednesday, 28 October 2020 11:29:40 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 28 October 2020 11:29:41 UTC