Re: Yes it can and should ;) Re: Can RDFstar be defined as only syntactic sugar on top of RDF (Re: weakness of embedded triples) from Pierre-Antoine Champin on 2020-10-29 (public-rdf-star@w3.org from October 2020)

From: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>
Date: Thu, 29 Oct 2020 10:55:51 +0100
To: Holger Knublauch <holger@topquadrant.com>, public-rdf-star@w3.org
Message-ID: <aa2e87b9-4c72-6f9d-1fa2-81f6de7e19fa@ercim.eu>
Holger,

If I follow your idea that RDF* must be only syntactic sugar on top of
RDF, it means that any RDF* graph is in fact a standard RDF graph, hence
can be serialized as Turtle (or RDF/XML, etc...).

So these long URIs representing triples MAY end up exposed to some
systems that do not understand the Turtle* syntax.

Yet, it seems to me (but I am not entirely sure) that some of your
arguments below rely on the fact that these long URIs would only be used
internally, and never exposed, which only works if everyone adopts RDF*
/ Turtle* / SPARQL*. In that case I would argue that, although you are
reusing the same URI type in your implementation, the special treatment
you add for this type of URI makes it, in practice, a new type of node...

More comments below.

On 29/10/2020 04:11, Holger Knublauch wrote:
> On 10/28/2020 9:29 PM, Jerven Bolleman wrote:
>
>> Hi All,
>>
>> Yes it can be defined as syntactic sugar, and IMO it should. 
>
> I agree. And this is not merely a matter of allowing RDF* to be
> implemented by special IRIs as one implementation strategy among
> others, but it should become the only permitted implementation
> strategy. The reason is that operators such as isIRI() need to work
> consistently across implementations.
Agreed
> isIRI() should return true for embedded triples. 
I don't see this as a requirement for RDF* in general!
> There should not be a new RDF node type because that is simply not
> needed. No application should break because it encounters RDF* graphs.
> Whether it can make sense and interpret them as "reifications" (e.g.
> for display purposes) should be for them to decide incrementally. But
> by default they are just URIs to them.
Ok, so we agree that long URIs could be exposed out of the system that
generated them.
>
> Having said this, the spec could remain vague about how these long
> URIs are formed.

I disagree. If RDF* is really just syntactic sugar, then an RDF* graph
should be serializable in Turtle* or in Turtle indifferently, without
loss of information. So two different RDF* systems should be able to
exchange that RDF* graphs using Turtle*, /but also /using Turtle.

I don't see how this could happen if the "encoding" of embedded triples
into URIs is not totally specified.

> This includes the question of how blank nodes are represented. I would
> simply map them to the serialization of the internal IDs. The APIs and
> SPARQL should provide a function to distinguish these special IRIs
> from "normal" ones, and then functions to extract subject, predicate
> and object, and vice versa. See
> http://datashapes.org/reification.html#tosh for an example
> implementation.
>
> A minimum practice that the spec could prescribe is that these IRIs
> need to start with, say, "urn:triple:" but this is mainly to make sure
> that no other applications use those URIs by accident (which is really
> unlikely in practice but still...) and for human readers who stumble
> upon them in raw form.
>
>> Considering in the beginning that RDF* was defined in terms of RDF
>> reification. And it can be implemented that way e.g. as POC done for
>> rdflib.
>>
>> Which gets is back to the reason what do we want to achieve with
>> RDFstar? I think most of us want to have reification without the
>> hassle of writing out a quad all the time. Plus having optimizations
>> possible at the storage level/query engine.
>>
>> A solution for the triples with all IRI and/or Literals is simply
>> generate an implicit IRI for them.
>>
>> For the triples with a blank node I think the simplest is to generate an
>> implicit new blank node for them. (Which would be skolemized in some
>> form in any triple store anyway).
>
> Not sure why this is necessary. Why not use IRIs? A typical scenario
> would be:
>
> 1. RDF* file gets loaded into graph store
> 2. Graph store selects its new internal IDs for blank nodes
> 3. References to bnodes from embedded triples use these same (new)
> IDs. All good.
> 4. RDF* file gets saved back.
> 5. The cycle repeats on another machine that loads this new file.

This is all very good if only Turtle* (or another RDF*-aware syntax) is
used.

If at step 4 you serialize the RDF* graph in, say, Turtle, then the URI
encoding your internal ID is leaked to the outside world.

Even worse: another embedded triple, from another system using the same
internal ID, would be generated with the same IRI (even if the two
bnodes represent totally different things).

>
> While such an RDF* graph is in memory (or database storage) the actual
> IDs don't matter to anyone, and shouldn't be relied on by any external
> graph. This is already the situation for blank nodes now - they are
> anonymous nodes within the current graph only.
>
> Likewise no external graph should rely on the specific syntax of these
> (possibly long) IRIs - they are only accessed and used via SPARQL*
> operators and corresponding functions.
>
>> (...)
>>
>> The problem here becomes, lack of a WG means we don't have a good way
>> to determine consensus and actually record a decision.
>
> A while ago there was a poll on times for a first meeting. Is this
> still the plan?

We are not forgetting it. The decision should be made soon.

  best
Received on Thursday, 29 October 2020 09:56:02 UTC