Re: Yes it can and should ;) Re: Can RDFstar be defined as only syntactic sugar on top of RDF (Re: weakness of embedded triples) from Pierre-Antoine Champin on 2020-10-29 (public-rdf-star@w3.org from October 2020)

From: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>
Date: Thu, 29 Oct 2020 17:34:41 +0100
To: Holger Knublauch <holger@topquadrant.com>, public-rdf-star@w3.org
Message-ID: <71f6ed6c-897e-7e4a-deb2-c8bcf05ea430@ercim.eu>
Holger,

On 29/10/2020 13:00, Holger Knublauch wrote:
>
>
> On 29/10/2020 7:55 pm, Pierre-Antoine Champin wrote:
>>
>> Holger,
>>
>> If I follow your idea that RDF* must be only syntactic sugar on top
>> of RDF, it means that any RDF* graph is in fact a standard RDF graph,
>> hence can be serialized as Turtle (or RDF/XML, etc...).
>>
>> So these long URIs representing triples MAY end up exposed to some
>> systems that do not understand the Turtle* syntax.
>>
>> Yet, it seems to me (but I am not entirely sure) that some of your
>> arguments below rely on the fact that these long URIs would only be
>> used internally, and never exposed, which only works if everyone
>> adopts RDF* / Turtle* / SPARQL*. In that case I would argue that,
>> although you are reusing the same URI type in your implementation,
>> the special treatment you add for this type of URI makes it, in
>> practice, a new type of node...
>>
>> More comments below.
>>
>> On 29/10/2020 04:11, Holger Knublauch wrote:
>>> On 10/28/2020 9:29 PM, Jerven Bolleman wrote:
>>>
>>>> Hi All,
>>>>
>>>> Yes it can be defined as syntactic sugar, and IMO it should. 
>>>
>>> I agree. And this is not merely a matter of allowing RDF* to be
>>> implemented by special IRIs as one implementation strategy among
>>> others, but it should become the only permitted implementation
>>> strategy. The reason is that operators such as isIRI() need to work
>>> consistently across implementations.
>> Agreed
>>> isIRI() should return true for embedded triples. 
>> I don't see this as a requirement for RDF* in general!
>>> There should not be a new RDF node type because that is simply not
>>> needed. No application should break because it encounters RDF*
>>> graphs. Whether it can make sense and interpret them as
>>> "reifications" (e.g. for display purposes) should be for them to
>>> decide incrementally. But by default they are just URIs to them.
>> Ok, so we agree that long URIs could be exposed out of the system
>> that generated them.
>>>
>>> Having said this, the spec could remain vague about how these long
>>> URIs are formed.
>>
>> I disagree. If RDF* is really just syntactic sugar, then an RDF*
>> graph should be serializable in Turtle* or in Turtle indifferently,
>> without loss of information. So two different RDF* systems should be
>> able to exchange that RDF* graphs using Turtle*, /but also /using Turtle.
>>
>> I don't see how this could happen if the "encoding" of embedded
>> triples into URIs is not totally specified.
>>
> I would be ok with that.

By saying that "RDF* is merely syntactic sugar on top of RDF", I expect that

1. any Turtle* document can be translated into an equivalent Turtle
document, and

2. any RDF*-aware system could translate that Turtle document into the
original Turtle* document (or something close enough...).

If every system mints proprietary IRIs for triples, which other systems
are not guaranteed to parse as embedded triple, we fail to satisfy item
2. And then I would claim that RDF* is not just syntactic sugar. It is a
model of its own, that is captured by Turtle* but NOT by RDF syntaxes
(abstract or concrete).


>>> This includes the question of how blank nodes are represented. I
>>> would simply map them to the serialization of the internal IDs. The
>>> APIs and SPARQL should provide a function to distinguish these
>>> special IRIs from "normal" ones, and then functions to extract
>>> subject, predicate and object, and vice versa. See
>>> http://datashapes.org/reification.html#tosh for an example
>>> implementation.
>>>
>>> A minimum practice that the spec could prescribe is that these IRIs
>>> need to start with, say, "urn:triple:" but this is mainly to make
>>> sure that no other applications use those URIs by accident (which is
>>> really unlikely in practice but still...) and for human readers who
>>> stumble upon them in raw form.
>>>
>>>> Considering in the beginning that RDF* was defined in terms of RDF
>>>> reification. And it can be implemented that way e.g. as POC done
>>>> for rdflib.
>>>>
>>>> Which gets is back to the reason what do we want to achieve with
>>>> RDFstar? I think most of us want to have reification without the
>>>> hassle of writing out a quad all the time. Plus having
>>>> optimizations possible at the storage level/query engine.
>>>>
>>>> A solution for the triples with all IRI and/or Literals is simply
>>>> generate an implicit IRI for them.
>>>>
>>>> For the triples with a blank node I think the simplest is to
>>>> generate an
>>>> implicit new blank node for them. (Which would be skolemized in
>>>> some form in any triple store anyway).
>>>
>>> Not sure why this is necessary. Why not use IRIs? A typical scenario
>>> would be:
>>>
>>> 1. RDF* file gets loaded into graph store
>>> 2. Graph store selects its new internal IDs for blank nodes
>>> 3. References to bnodes from embedded triples use these same (new)
>>> IDs. All good.
>>> 4. RDF* file gets saved back.
>>> 5. The cycle repeats on another machine that loads this new file.
>>
>> This is all very good if only Turtle* (or another RDF*-aware syntax)
>> is used.
>>
>> If at step 4 you serialize the RDF* graph in, say, Turtle, then the
>> URI encoding your internal ID is leaked to the outside world.
>>
>> Even worse: another embedded triple, from another system using the
>> same internal ID, would be generated with the same IRI (even if the
>> two bnodes represent totally different things).
>>
> So why would this be worse than a solution that introduces a new node
> type for triples: These cannot even be written to turtle at all.
>
It is worse because it creates the /illusion/ of interoperability.

I would rather have an RDF* system saying "sorry, I can not convert this
graph into plain Turtle, because it contains embedded triples", than
accepting to do so, encoding the embedded triples into proprietary long
URIs, that will not be interpreted correctly by other RDF*-aware systems...

And to repeat myself, if you consider that some graphs can not
losslessly serialized into Turtle, then you acknowledge the fact that
the underlying model is an /extension/ of RDF (because by design, Turtle
is able to serialize any RDF graph), NOT just syntactic sugar.

  best

> Holger
>
>
>>>
>>> While such an RDF* graph is in memory (or database storage) the
>>> actual IDs don't matter to anyone, and shouldn't be relied on by any
>>> external graph. This is already the situation for blank nodes now -
>>> they are anonymous nodes within the current graph only.
>>>
>>> Likewise no external graph should rely on the specific syntax of
>>> these (possibly long) IRIs - they are only accessed and used via
>>> SPARQL* operators and corresponding functions.
>>>
>>>> (...)
>>>>
>>>> The problem here becomes, lack of a WG means we don't have a good way
>>>> to determine consensus and actually record a decision.
>>>
>>> A while ago there was a poll on times for a first meeting. Is this
>>> still the plan?
>>
>> We are not forgetting it. The decision should be made soon.
>>
>>   best
>>
>>
Received on Thursday, 29 October 2020 16:34:52 UTC