Re: Yes it can and should ;) Re: Can RDFstar be defined as only syntactic sugar on top of RDF (Re: weakness of embedded triples) from Holger Knublauch on 2020-10-29 (public-rdf-star@w3.org from October 2020)

From: Holger Knublauch <holger@topquadrant.com>
Date: Thu, 29 Oct 2020 22:00:40 +1000
To: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>, public-rdf-star@w3.org
Message-ID: <c4627d2a-57ef-78a2-aff8-2fcc88c63806@topquadrant.com>
On 29/10/2020 7:55 pm, Pierre-Antoine Champin wrote:
>
> Holger,
>
> If I follow your idea that RDF* must be only syntactic sugar on top of 
> RDF, it means that any RDF* graph is in fact a standard RDF graph, 
> hence can be serialized as Turtle (or RDF/XML, etc...).
>
> So these long URIs representing triples MAY end up exposed to some 
> systems that do not understand the Turtle* syntax.
>
> Yet, it seems to me (but I am not entirely sure) that some of your 
> arguments below rely on the fact that these long URIs would only be 
> used internally, and never exposed, which only works if everyone 
> adopts RDF* / Turtle* / SPARQL*. In that case I would argue that, 
> although you are reusing the same URI type in your implementation, the 
> special treatment you add for this type of URI makes it, in practice, 
> a new type of node...
>
> More comments below.
>
> On 29/10/2020 04:11, Holger Knublauch wrote:
>> On 10/28/2020 9:29 PM, Jerven Bolleman wrote:
>>
>>> Hi All,
>>>
>>> Yes it can be defined as syntactic sugar, and IMO it should. 
>>
>> I agree. And this is not merely a matter of allowing RDF* to be 
>> implemented by special IRIs as one implementation strategy among 
>> others, but it should become the only permitted implementation 
>> strategy. The reason is that operators such as isIRI() need to work 
>> consistently across implementations.
> Agreed
>> isIRI() should return true for embedded triples. 
> I don't see this as a requirement for RDF* in general!
>> There should not be a new RDF node type because that is simply not 
>> needed. No application should break because it encounters RDF* 
>> graphs. Whether it can make sense and interpret them as 
>> "reifications" (e.g. for display purposes) should be for them to 
>> decide incrementally. But by default they are just URIs to them.
> Ok, so we agree that long URIs could be exposed out of the system that 
> generated them.
>>
>> Having said this, the spec could remain vague about how these long 
>> URIs are formed.
>
> I disagree. If RDF* is really just syntactic sugar, then an RDF* graph 
> should be serializable in Turtle* or in Turtle indifferently, without 
> loss of information. So two different RDF* systems should be able to 
> exchange that RDF* graphs using Turtle*, /but also /using Turtle.
>
> I don't see how this could happen if the "encoding" of embedded 
> triples into URIs is not totally specified.
>
I would be ok with that.
>
>> This includes the question of how blank nodes are represented. I 
>> would simply map them to the serialization of the internal IDs. The 
>> APIs and SPARQL should provide a function to distinguish these 
>> special IRIs from "normal" ones, and then functions to extract 
>> subject, predicate and object, and vice versa. See 
>> http://datashapes.org/reification.html#tosh for an example 
>> implementation.
>>
>> A minimum practice that the spec could prescribe is that these IRIs 
>> need to start with, say, "urn:triple:" but this is mainly to make 
>> sure that no other applications use those URIs by accident (which is 
>> really unlikely in practice but still...) and for human readers who 
>> stumble upon them in raw form.
>>
>>> Considering in the beginning that RDF* was defined in terms of RDF 
>>> reification. And it can be implemented that way e.g. as POC done for 
>>> rdflib.
>>>
>>> Which gets is back to the reason what do we want to achieve with 
>>> RDFstar? I think most of us want to have reification without the 
>>> hassle of writing out a quad all the time. Plus having optimizations 
>>> possible at the storage level/query engine.
>>>
>>> A solution for the triples with all IRI and/or Literals is simply 
>>> generate an implicit IRI for them.
>>>
>>> For the triples with a blank node I think the simplest is to 
>>> generate an
>>> implicit new blank node for them. (Which would be skolemized in some 
>>> form in any triple store anyway).
>>
>> Not sure why this is necessary. Why not use IRIs? A typical scenario 
>> would be:
>>
>> 1. RDF* file gets loaded into graph store
>> 2. Graph store selects its new internal IDs for blank nodes
>> 3. References to bnodes from embedded triples use these same (new) 
>> IDs. All good.
>> 4. RDF* file gets saved back.
>> 5. The cycle repeats on another machine that loads this new file.
>
> This is all very good if only Turtle* (or another RDF*-aware syntax) 
> is used.
>
> If at step 4 you serialize the RDF* graph in, say, Turtle, then the 
> URI encoding your internal ID is leaked to the outside world.
>
> Even worse: another embedded triple, from another system using the 
> same internal ID, would be generated with the same IRI (even if the 
> two bnodes represent totally different things).
>
So why would this be worse than a solution that introduces a new node 
type for triples: These cannot even be written to turtle at all.

Holger


>>
>> While such an RDF* graph is in memory (or database storage) the 
>> actual IDs don't matter to anyone, and shouldn't be relied on by any 
>> external graph. This is already the situation for blank nodes now - 
>> they are anonymous nodes within the current graph only.
>>
>> Likewise no external graph should rely on the specific syntax of 
>> these (possibly long) IRIs - they are only accessed and used via 
>> SPARQL* operators and corresponding functions.
>>
>>> (...)
>>>
>>> The problem here becomes, lack of a WG means we don't have a good way
>>> to determine consensus and actually record a decision.
>>
>> A while ago there was a poll on times for a first meeting. Is this 
>> still the plan?
>
> We are not forgetting it. The decision should be made soon.
>
>   best
>
>
Received on Thursday, 29 October 2020 12:00:56 UTC