Re: weakness of embedded triples from thomas lörtsch on 2020-10-21 (public-rdf-star@w3.org from October 2020)

From: thomas lörtsch <tl@rat.io>
Date: Wed, 21 Oct 2020 12:51:56 +0200
To: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>
Cc: public-rdf-star@w3.org
Message-Id: <3A15B70D-71BE-4E76-B065-E8F476E2216F@rat.io>
> On 21. Oct 2020, at 10:44, Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu> wrote:
> 
> Dear Martynas,
> 
> On 19/10/2020 21:07, Martynas Jusevičius wrote:
>> It seems that a number of stores are already implementing RDF* using
>> unique triple IDs. And it works with blank nodes using skolemization.
>> 
>> Wouldn't it be easier to standardize this approach instead of drafting
>> completely new semantics?
>> 
>> Certainly less disruptive.
>> 
> Strictly speaking, while skolemization preserves a number of semantic properties (like satisfiability), it alters the semantics of the graph:
> 
>     The resulting formula is not necessarily equivalent to the original one, but is equisatisfiable with it
> 
>     -- https://en.wikipedia.org/wiki/Skolem_normal_form
> 
> So making skolemization the norm would be disruptive with respect to the existing semantics.


RDF 1.1 Concepts and Abstract Syntax, Section 3.5 [0] says:

"Blank nodes do not have identifiers in the RDF abstract syntax. The blank node identifiers introduced by some concrete syntaxes have only local scope and are purely an artifact of the serialization.
In situations where stronger identification is needed, systems MAY systematically replace some or all of the blank nodes in an RDF graph with IRIs. [...] This transformation does not appreciably change the meaning of an RDF graph [...]. "

Where is the disruption?

Thomas


[0] https://www.w3.org/TR/rdf11-concepts/

> Also, note that the semantics' goal is not to prescribe a particular implementation method; it is to ensure that different implementations remain interoperable.

>   pa
> 
>> 
>> On Mon, Oct 19, 2020 at 11:07 AM Pavel Klinov
>> <pavel@stardog.com>
>>  wrote:
>> 
>>> Yeah right. We have a mechanism in place to avoid using the same Skolem constant for bnodes with the same lexical form occurring in multiple RDF datasets (eg. when loading multiple files) but that's pretty much it. IIRC it's called something like "standardising apart" in one of the RDF docs.
>>> 
>>> Cheers,
>>> Pavel
>>> 
>>> 
>>> 
>>> On Mon, Oct 19, 2020 at 10:54 AM Pierre-Antoine Champin
>>> <pierre-antoine.champin@ercim.eu>
>>>  wrote:
>>> 
>>>> Dear all,
>>>> 
>>>> Holger, Pavel: I assume that blank nodes are internally skolemized, so indeed, internally, you only have IRIs and literals. Correct?
>>>> 
>>>> On 19/10/2020 10:28, Holger Knublauch wrote:
>>>> 
>>>> Similar situation here at TopQuadrant, see
>>>> 
>>>> 
>>>> http://datashapes.org/reification.html#uriReification
>>>> 
>>>> 
>>>> Holger
>>>> 
>>>> 
>>>> On 10/19/2020 6:24 PM, Pavel Klinov wrote:
>>>> 
>>>> This is roughly how Stardog supports RDF* and so far we find it sufficient in the enterprise context. It's pretty easily understood by users familiar with edge properties in the property graph data model, which is one of the most important factors for us.
>>>> 
>>>> Cheers,
>>>> Pavel
>>>> 
>>>> On Sat, Oct 17, 2020 at 9:54 PM Martynas Jusevičius
>>>> <martynas@atomgraph.com>
>>>>  wrote:
>>>> 
>>>>> Does RDF* need new semantics at all? Couldn't it be a syntax-level
>>>>> convention for unique triple IDs?
>>>>> 
>>>>> E.g. <<s>, <p>, <o>> being syntactic sugar for
>>>>> uri(concat("urn:rdf:id:", hash(str(<s>)), hash(str(<p>)),
>>>>> hash(str(<p>)))).
>>>>> 
>>>>> For example, the triple
>>>>> 
>>>>> <
>>>>> <https://www.w3.org/People/Berners-Lee/card>
>>>>> <http://xmlns.com/foaf/0.1/primaryTopic>
>>>>> <https://www.w3.org/People/Berners-Lee/card#i>
>>>>> >
>>>>> 
>>>>> gives
>>>>> 
>>>>> URI(CONCAT("urn:rdf:id:",
>>>>> SHA1(STR(
>>>>> <https://www.w3.org/People/Berners-Lee/card>
>>>>> )),
>>>>> SHA1(STR(
>>>>> <http://xmlns.com/foaf/0.1/primaryTopic>
>>>>> )),
>>>>> SHA1(STR(
>>>>> <https://www.w3.org/People/Berners-Lee/card#i>
>>>>> ))))
>>>>> 
>>>>> gives
>>>>> 
>>>>> <urn:rdf:id:63874e34ff5f326e67e888f6818f72d5033ecb343cadd8c2120281d72cefce4481485c937b6a95a656beaa67c13db29f3d7be801328b7c9125976c5f>
>>>>> 
>>>>> which essentially would become the "5th element", in addition to quads.
>>>>> 
>>>>> On Thu, Oct 15, 2020 at 1:38 PM Pierre-Antoine Champin
>>>>> 
>>>>> <pierre-antoine.champin@ercim.eu>
>>>>>  wrote:
>>>>> 
>>>>>> 
>>>>>> On 14/10/2020 23:13, Peter F. Patel-Schneider wrote:
>>>>>> 
>>>>>> Let's make the height example even more stark.
>>>>>> 
>>>>>> 
>>>>>> :loisLane :believes << :clarkKent :height "6.0"^^xsd:decimal >> .
>>>>>> 
>>>>>> 
>>>>>> does not imply
>>>>>> 
>>>>>> 
>>>>>> :loisLane :believes << :clarkKent :height "6.00"^^xsd:decimal >> .
>>>>>> 
>>>>>> 
>>>>>> I would hope that any Tom, Dick, and Lois can realize that these two literals
>>>>>> are the same.
>>>>>> 
>>>>>> I see your point, but this is really a matter of deciding where you put the boundary...
>>>>>> 
>>>>>> So I would still prefer to be radical here and consider any lexical difference as potentially significant.
>>>>>> 
>>>>>> If you want to stick to literals that have to be supported in RDF
>>>>>> 
>>>>>> 
>>>>>> :loisLane :believes << :clarkKent :name "Clark"@en-US >> .
>>>>>> 
>>>>>> 
>>>>>> does not imply
>>>>>> 
>>>>>> 
>>>>>> :loisLane :believes << :clarkKent :name "Clark"@en-us >> .
>>>>>> 
>>>>>> Are "Clark"@en-US and "Clark"@en-us really different literals, for the abstract syntax??
>>>>>> 
>>>>>> I would have thought they are the same (and so the implication above would hold).
>>>>>> 
>>>>>> Reading the spec again, I realize that things are not so clear: "Lexical representations of language tags MAY be converted to lower case", and then Literal term equality requires that language tags "compare equal, character by character". So these 2 literals MAY be considered equal, and the implication MAY hold... :-/ Add to this that BCP47 explicitly state that language tags are case insensitive... I'd say that we are in gray area here.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> peter
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 10/14/20 4:45 PM, Doerthe Arndt wrote:
>>>>>> 
>>>>>> Dear Peter,
>>>>>> 
>>>>>> you are right with both observations. The question is whether we want that
>>>>>> behavior or not.
>>>>>> 
>>>>>> In
>>>>>> https://w3c.github.io/rdf-star/
>>>>>>  there is a section on referential opacity.
>>>>>> The main claim there is that triples are referentially opaque.
>>>>>> 
>>>>>> 
>>>>>> But embedded triples are much weaker than just being referntially opaque.  To
>>>>>> see this consider the following RDF* graph under the RDF* version of RDF
>>>>>> entailment recognizing xsd:decimal and xsd:integer.
>>>>>> 
>>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:decimal >> .
>>>>>> 
>>>>>> In this semantics "6"^^xsd:decimal means the same as "6"^^xsd:integer so one
>>>>>> would expect that
>>>>>> 
>>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:integer >> .
>>>>>> 
>>>>>> is RDF*-entailed.
>>>>>> 
>>>>>> But it is not.  There are two reasons for this.
>>>>>> 
>>>>>> First, there is no requirement that satisfying interpretations for the first
>>>>>> graph map < :clarkKent :height "6"^^xsd:integer > to anything and if a
>>>>>> satisfying interpretation does map the triple there is no requirement that its
>>>>>> ITEXT mapping gives the triple its correct meaning.  (The value of ITEXT for
>>>>>> the triple could have the real number pi as its third element.)
>>>>>> 
>>>>>> Second, "6"^^xsd:integer is a different node from "6"^^xsd:decimal. So even if
>>>>>> the intepretation treats the second embedded triple nicely, and thus gives it
>>>>>> the same meaning as the first embedded triple, they are still two different
>>>>>> triples and :loisLane can believe one but not the other.  So very little of
>>>>>> the semantics of RDF gets into embedded triples.
>>>>>> 
>>>>>> We wanted different that different representations are treated differently
>>>>>> if they have the same meaning. The reason for that is that we expected that
>>>>>> RDF* would also be used to make statements about triples as they were
>>>>>> stated, for example to be able to explain the reasoning performed on the
>>>>>> triples but also for simple provenance. In these cases there should be a
>>>>>> difference between
>>>>>> 
>>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:decimal >> .
>>>>>> 
>>>>>> and
>>>>>> 
>>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:integer >>
>>>>>> 
>>>>>> since we still talk about different representations.
>>>>>> 
>>>>>> Each triple is, in effect, its own context.  So, in an RDFS version of RDF*,
>>>>>> even if :loisLane believes several triples that should imply another, they
>>>>>> generally don't.  For example:
>>>>>> 
>>>>>> :loisLane :believes << :clarkKent rdf:type :man >> .
>>>>>> :loisLane :believes << :man rdfs:subClassOf :human >> .
>>>>>> 
>>>>>> Does not imply
>>>>>> 
>>>>>> :loisLane :believes << :clarkKent rdf:type :human >> .
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> So embedded triples are incredibly weak in RDF*.   Making them useful will
>>>>>> likely require quite a bit of work.
>>>>>> 
>>>>>> Here, "useful" depends again on your intended use. We wanted to have a
>>>>>> rather weak semantics which allows users with more complex use cases to add
>>>>>> their semantics. It is easier to make the semantics more complex by adding
>>>>>> extensions than to ignore certain parts. I for example remember that Jos De
>>>>>> Roo announced some time ago that his EYE reasoner supports rules on RDF*. Of
>>>>>> course that alone would not allow you to cover all cases, but it could be
>>>>>> very helpful in practice.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On the other hand, there are some unusual inferences that can be made in
>>>>>> RDF*.  In an RDF* version of RDFS++ it is possible to state that two triples
>>>>>> are the same.   The graph
>>>>>> 
>>>>>> :loisLane :believes << :superman :can :fly >>.
>>>>>> << :superman :can :fly >> owl:sameAs << :clarkKent :can :fly >> .
>>>>>> 
>>>>>> is consistent here and implies
>>>>>> 
>>>>>> :superman owl:sameAs :clarkKent .
>>>>>> :loisLane :believes << :clarkKent :can :fly >>.
>>>>>> 
>>>>>> This last case is an interesting one. We indeed wanted the triple
>>>>>> 
>>>>>> :loisLane :believes << :clarkKent :can :fly >>.
>>>>>> 
>>>>>> to be a consequence of your statements. The question is whether
>>>>>> 
>>>>>> :superman owl:sameAs :clarkKent .
>>>>>> 
>>>>>> should follow (it does indeed follow, just as you describe). We made the
>>>>>> semantics of embedded triples the way it is to be able to deal with blank
>>>>>> notes. Here, I can't give a concrete answer whether (at least to my
>>>>>> understanding) it should be that way. I will think about it (and read
>>>>>> Pierre-Antoine's thoughts in the mean time, which just arrived as well) and
>>>>>> come back to you.
>>>>>> 
>>>>>> Kind regards,
>>>>>> Doerthe
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>
Received on Wednesday, 21 October 2020 10:52:44 UTC