Re: Yes it can and should ;) Re: Can RDFstar be defined as only syntactic sugar on top of RDF (Re: weakness of embedded triples) from Pierre-Antoine Champin on 2020-10-28 (public-rdf-star@w3.org from October 2020)

From: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>
Date: Wed, 28 Oct 2020 17:36:48 +0100
To: public-rdf-star@w3.org
Message-ID: <5d4b293a-0f45-d8c7-ac0c-66c8fb88a464@ercim.eu>
Jerven,

thanks for your feedback.

On 28/10/2020 12:29, Jerven Bolleman wrote:
> Hi All,
>
> Yes it can be defined as syntactic sugar, and IMO it should.
> Considering in the beginning that RDF* was defined in terms of RDF
> reification.

To be precise, RDF* was defined with an abstract syntax of its own, but
no specific semantics (instead, a projection to RDF standard reification).

What motivated us to propose a dedicated semantics was, mostly

* that standard reification has no specific semantics in standard RDF, and

* to provision for referential opacity (more on this below).

I'll discuss the second item below (as you mention it in the following
of your message).

> And it can be implemented that way e.g. as POC done for rdflib.
>
> Which gets is back to the reason what do we want to achieve with
> RDFstar? I think most of us want to have reification without the
> hassle of writing out a quad all the time. Plus having optimizations
> possible at the storage level/query engine.
>
> A solution for the triples with all IRI and/or Literals is simply
> generate an implicit IRI for them.
>
> For the triples with a blank node I think the simplest is to generate an
> implicit new blank node for them. (Which would be skolemized in some
> form in any triple store anyway).
>
> SPARQL and RDF syntax wise this would be simple. API wise this would
> also be easy in python rdflib. We have two new subclasses of URIRef,
> and BNode basically.
>
>
> class TripleUriRef(URIRef):
>
> class TripleBnode(BNode):
>
> in java rdf4j it would probably be something like
>
> interface Triple{
> }
> interface TripleIRI extends IRI, Triple {
> }
> interface TripleBNode extends BNode, Triple {
> }
>
> Semantics stay the same, we only get a new syntax for reification.
> Question about PG|SA stays unanswered by doing this.

I agree that this strategy would probably work if it wasn't for
referential opacity.

However, you would still need to specify a precise semantics for
rdf:subject, rdf:predicate and rdf:object, to specify how to handle
corner cases such as:

    :alice :says [ rdf:subject :bob ].

    :alice :says [ rdf:subject :bob, :charlie;  rdf:predicate :age; 
rdf:age 42 ].

    :alice :says [ rdf:subject :bob;  rdf:predicate :age;  rdf:age 42 ].
    :bob :says [ rdf:subject :bob;  rdf:predicate :age;  rdf:age 42 ].

>
> For me as an user, what benefits would a different semantics for
> RDFstar bring?

Calling it a "different semantics" is slightly unfair; it is an
/extension/ of the standard semantics.

In any case, you need to extend the semantics, either of the new
abstract syntax (the approach we chose),

or of the specific vocabulary (here, rdf:subject and co.) used to
"encode" RDF* into RDF's abstract syntax.

Granted, the second option would be less disruptive. But it assumes that
referential opacity is not a problem (again, see below).

>
> Going the syntactic sugar route we don't need a specification for
> RDFstar, just for TurtleStar, RDF/XMLstar and SPARQLstar etc.
> Which we would need anyway.
>
> The problem here becomes, lack of a WG means we don't have a good way
> to determine consensus and actually record a decision.
The idea was to mimic a WG process as much as possible, to help things
move forward (with a report, github issues to track discussions and
decisions, and weekly calls).
>
> Still mapping RDFstar in terms of RDF reification leaves open the
> issue of referential opacity. I think that this is a red herring, I
> think the superman problem is an issue with datamodelling, which
> should not complicate our entire tech stack. i.e the :superman
> owl:sameAs :clarkKent is a faulty assertion and that fault leads to
> the impossibility to correctly express what :louislane believes.

owl:sameAs is a can of worms of its own. If you are not convinced by the
superman problem, what about:

    << dbr:Paris dbo:populationTotal 2229621>> :assertedBy
<http://dbpedia.org>.
    << geo:2229621 gn:population 2138551>> :assertedBy
<http://geonames.org>.
    dbr:Paris owl:sameAs geo:2229621.

Arguably, we don't want to infer that <http://geonames.org> says
anything about dbr:Paris (that is, if we want to preserve *which terms*
was used by each source, not only *which resource* they are describing).

  best

>
> Regards,
> Jerven
>
> PS. Regarding the PG or SA mode I am a fan of going for PG, given the
> UniProt experience with RDF/XML rdf:ID  which is a PG syntax. rdf:ID
> being a PG syntax is important for our internal code being needed to
> generate and read our rdf/xml. Not having this kind of sugar in other
> syntaxes is why we are still preferably shipping rdf/xml for UniProt.
>
> PSS. about 15% of UniProt triples are reifcation quads. Being able to
> get these out of the quad table would be nice. Especially the
> consequences for reducing the joins to use them and how badly these
> quads fit into current indexes.
>
>
>
>
> On 10/28/20 9:57 AM, Pierre-Antoine Champin wrote:
>> Holger,
>>
>> (I did what should have been done a long time ago: rename this
>> subthread to something more relevant)
>>
>> On 27/10/2020 23:31, Holger Knublauch wrote:
>>> On 10/28/2020 1:53 AM, Pierre-Antoine Champin wrote:
>>>> Holger,
>>>>
>>>> Now I'm confused. This thread (which should have been renamed a long
>>>> time ago) is, in my understanding, about Martynas' question raised
>>>> here
>>>>
>>>> <https://www.w3.org/mid/CAE35Vmy3vbThwHnKjbhMQuwKkH0BhNoxr_Gp15Ri5LfOdedsSA@mail.gmail.com>
>>>>
>>>>
>>>>> Does RDF* need new semantics at all?
>>>> While I believe the answer is "yes", I concede that answering "no" to
>>>> that question would be convenient, because it would mean that existing
>>>> implementations of RDF could handle RDF* at the syntactical level
>>>> only,
>>>> i.e. parse Turtle* and store it standard RDF triples.
>>>>
>>>> In your examples below, however, you propose to extend existing
>>>> implementations -- which defeats the purpose of fitting RDF* into
>>>> standard RDF semantics...
>>>
>>> The current RDF* draft requires introducing a 4th term type "RDF*
>>> triples":
>>>
>>> > IRIs <https://www.w3.org/TR/rdf11-concepts/#dfn-iri>,literals
>>> <https://www.w3.org/TR/rdf11-concepts/#dfn-literal>,blank nodes
>>> <https://www.w3.org/TR/rdf11-concepts/#dfn-blank-node>andRDF*
>>> triples <https://w3c.github.io/rdf-star/#dfn-triple>are collectively
>>> known asRDF* terms.
>>>
>> Correct.
>>
>> In my understanding, introducing a new subclass of Node (TripleNode)
>> was the implementation counterpart of this extension of the abstract
>> syntax, but it seems that I was misunderstanding.
>>
>>> The approaches based on (long) URIs avoid this and therefore are
>>> likely much less concerning w.r.t. existing implementations. A
>>> syntactic mapping means that existing APIs can represent these
>>> triples as normal URI nodes. From our own experience moving to this
>>> design was not disruptive (although some users have raised concerns
>>> about exposing the ugly long URIs in unexpected places such as
>>> exporting them to plain Turtle).
>>>
>>> Having said this, *some* implementations may represent these triple
>>> nodes differently, e.g. using the internal data structure I outlined
>>> below. This avoids ever storing these long URIs, makes it arguably
>>> easier to index and search over them, and probably keeps the issues
>>> with changing bnode identifiers at bay. But this is an
>>> implementation detail to me.
>>>
>> Agreed, but conversely, implementing the current draft using long
>> IRIs can be considered an implementation detail...
>>
>> I think we also agree that the aim of the spec is to be as clear and
>> simple as possible, and that implementations may depart use their own
>> different internal models (for various reasons: backward
>> compatibility, optimization...) as long as they behave according the
>> the spec.
>>
>>> The additional semantics of interpreting these special URI nodes
>>> differently would be local to the RDF* specs and would not require
>>> adaptations to existing specs.
>>>
>> Again, agreed: this would be much less disruptive to the entire
>> ecosystem –and much less work for us in writing this document ;-).
>>
>> But again I think that blank nodes in embedded triples make this
>> approach very hard. Defining the correct behaviour of "long IRI" when
>> they actually represent embedded blank nodes, if even possible, would
>> be extremely cumbersome (as opposed to the clear and simple way that
>> the spec should aim for). That being said, any PR proving me wrong is
>> welcome.
>>
>>> Holger
>>>
>
>
Received on Wednesday, 28 October 2020 16:36:58 UTC