Re: weakness of embedded triples from Holger Knublauch on 2020-10-19 (public-rdf-star@w3.org from October 2020)

From: Holger Knublauch <holger@topquadrant.com>
Date: Tue, 20 Oct 2020 09:05:45 +1000
To: public-rdf-star@w3.org
Message-ID: <7a8f5484-26b5-ad1a-f09b-e2c5d0ebb6de@topquadrant.com>
On 10/20/2020 5:07 AM, Martynas Jusevičius wrote:
> It seems that a number of stores are already implementing RDF* using
> unique triple IDs. And it works with blank nodes using skolemization.
>
> Wouldn't it be easier to standardize this approach instead of drafting
> completely new semantics?
>
> Certainly less disruptive.

In our case, adding support for long special URIs took probably just a 
couple of weeks. We haven't started to try to migrate to the new 
Node_Triple type (i.e. the latest RDF* implementation in Jena) but I 
anticipate the impact of such a move will be a multi-months efforts. 
There are countless of places of the form

if(node.isURI()) {
     ...
}
else if(node.isBlank()) {
     ...
}
else { // Now we used to assume it's a Literal
     ...
}

which will now need to deal with the new node kind. Many of these places 
will be hard to identify and make such assumptions implicitly in the 
program logic.

This doesn't mean that the long URIs are without problems - they are 
kind of a hack too. One issue is that they cannot efficiently answer all 
queries if S, P or O are unbound and no "real" triple is also asserted, 
unless there is an additional triple index. I wouldn't count that to be 
a show stopper though.

For me the jury is still out on how to proceed. But we are also in a 
more extreme situation than other groups in that we have a very large 
code base that will need migration.

Holger


>
> On Mon, Oct 19, 2020 at 11:07 AM Pavel Klinov <pavel@stardog.com> wrote:
>> Yeah right. We have a mechanism in place to avoid using the same Skolem constant for bnodes with the same lexical form occurring in multiple RDF datasets (eg. when loading multiple files) but that's pretty much it. IIRC it's called something like "standardising apart" in one of the RDF docs.
>>
>> Cheers,
>> Pavel
>>
>>
>>
>> On Mon, Oct 19, 2020 at 10:54 AM Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu> wrote:
>>> Dear all,
>>>
>>> Holger, Pavel: I assume that blank nodes are internally skolemized, so indeed, internally, you only have IRIs and literals. Correct?
>>>
>>> On 19/10/2020 10:28, Holger Knublauch wrote:
>>>
>>> Similar situation here at TopQuadrant, see
>>>
>>>      http://datashapes.org/reification.html#uriReification
>>>
>>> Holger
>>>
>>>
>>> On 10/19/2020 6:24 PM, Pavel Klinov wrote:
>>>
>>> This is roughly how Stardog supports RDF* and so far we find it sufficient in the enterprise context. It's pretty easily understood by users familiar with edge properties in the property graph data model, which is one of the most important factors for us.
>>>
>>> Cheers,
>>> Pavel
>>>
>>> On Sat, Oct 17, 2020 at 9:54 PM Martynas Jusevičius <martynas@atomgraph.com> wrote:
>>>> Does RDF* need new semantics at all? Couldn't it be a syntax-level
>>>> convention for unique triple IDs?
>>>>
>>>> E.g. <<s>, <p>, <o>> being syntactic sugar for
>>>> uri(concat("urn:rdf:id:", hash(str(<s>)), hash(str(<p>)),
>>>> hash(str(<p>)))).
>>>>
>>>> For example, the triple
>>>>
>>>> <<https://www.w3.org/People/Berners-Lee/card>
>>>> <http://xmlns.com/foaf/0.1/primaryTopic>
>>>> <https://www.w3.org/People/Berners-Lee/card#i>>
>>>>
>>>> gives
>>>>
>>>> URI(CONCAT("urn:rdf:id:",
>>>> SHA1(STR(<https://www.w3.org/People/Berners-Lee/card>)),
>>>> SHA1(STR(<http://xmlns.com/foaf/0.1/primaryTopic>)),
>>>> SHA1(STR(<https://www.w3.org/People/Berners-Lee/card#i>))))
>>>>
>>>> gives
>>>>
>>>> <urn:rdf:id:63874e34ff5f326e67e888f6818f72d5033ecb343cadd8c2120281d72cefce4481485c937b6a95a656beaa67c13db29f3d7be801328b7c9125976c5f>
>>>>
>>>> which essentially would become the "5th element", in addition to quads.
>>>>
>>>> On Thu, Oct 15, 2020 at 1:38 PM Pierre-Antoine Champin
>>>> <pierre-antoine.champin@ercim.eu> wrote:
>>>>>
>>>>> On 14/10/2020 23:13, Peter F. Patel-Schneider wrote:
>>>>>
>>>>> Let's make the height example even more stark.
>>>>>
>>>>>
>>>>> :loisLane :believes << :clarkKent :height "6.0"^^xsd:decimal >> .
>>>>>
>>>>>
>>>>> does not imply
>>>>>
>>>>>
>>>>> :loisLane :believes << :clarkKent :height "6.00"^^xsd:decimal >> .
>>>>>
>>>>>
>>>>> I would hope that any Tom, Dick, and Lois can realize that these two literals
>>>>> are the same.
>>>>>
>>>>> I see your point, but this is really a matter of deciding where you put the boundary...
>>>>>
>>>>> So I would still prefer to be radical here and consider any lexical difference as potentially significant.
>>>>>
>>>>> If you want to stick to literals that have to be supported in RDF
>>>>>
>>>>>
>>>>> :loisLane :believes << :clarkKent :name "Clark"@en-US >> .
>>>>>
>>>>>
>>>>> does not imply
>>>>>
>>>>>
>>>>> :loisLane :believes << :clarkKent :name "Clark"@en-us >> .
>>>>>
>>>>> Are "Clark"@en-US and "Clark"@en-us really different literals, for the abstract syntax??
>>>>>
>>>>> I would have thought they are the same (and so the implication above would hold).
>>>>>
>>>>> Reading the spec again, I realize that things are not so clear: "Lexical representations of language tags MAY be converted to lower case", and then Literal term equality requires that language tags "compare equal, character by character". So these 2 literals MAY be considered equal, and the implication MAY hold... :-/ Add to this that BCP47 explicitly state that language tags are case insensitive... I'd say that we are in gray area here.
>>>>>
>>>>>
>>>>>
>>>>> peter
>>>>>
>>>>>
>>>>>
>>>>> On 10/14/20 4:45 PM, Doerthe Arndt wrote:
>>>>>
>>>>> Dear Peter,
>>>>>
>>>>> you are right with both observations. The question is whether we want that
>>>>> behavior or not.
>>>>>
>>>>> In https://w3c.github.io/rdf-star/ there is a section on referential opacity.
>>>>> The main claim there is that triples are referentially opaque.
>>>>>
>>>>>
>>>>> But embedded triples are much weaker than just being referntially opaque.  To
>>>>> see this consider the following RDF* graph under the RDF* version of RDF
>>>>> entailment recognizing xsd:decimal and xsd:integer.
>>>>>
>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:decimal >> .
>>>>>
>>>>> In this semantics "6"^^xsd:decimal means the same as "6"^^xsd:integer so one
>>>>> would expect that
>>>>>
>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:integer >> .
>>>>>
>>>>> is RDF*-entailed.
>>>>>
>>>>> But it is not.  There are two reasons for this.
>>>>>
>>>>> First, there is no requirement that satisfying interpretations for the first
>>>>> graph map < :clarkKent :height "6"^^xsd:integer > to anything and if a
>>>>> satisfying interpretation does map the triple there is no requirement that its
>>>>> ITEXT mapping gives the triple its correct meaning.  (The value of ITEXT for
>>>>> the triple could have the real number pi as its third element.)
>>>>>
>>>>> Second, "6"^^xsd:integer is a different node from "6"^^xsd:decimal. So even if
>>>>> the intepretation treats the second embedded triple nicely, and thus gives it
>>>>> the same meaning as the first embedded triple, they are still two different
>>>>> triples and :loisLane can believe one but not the other.  So very little of
>>>>> the semantics of RDF gets into embedded triples.
>>>>>
>>>>> We wanted different that different representations are treated differently
>>>>> if they have the same meaning. The reason for that is that we expected that
>>>>> RDF* would also be used to make statements about triples as they were
>>>>> stated, for example to be able to explain the reasoning performed on the
>>>>> triples but also for simple provenance. In these cases there should be a
>>>>> difference between
>>>>>
>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:decimal >> .
>>>>>
>>>>> and
>>>>>
>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:integer >>
>>>>>
>>>>> since we still talk about different representations.
>>>>>
>>>>> Each triple is, in effect, its own context.  So, in an RDFS version of RDF*,
>>>>> even if :loisLane believes several triples that should imply another, they
>>>>> generally don't.  For example:
>>>>>
>>>>> :loisLane :believes << :clarkKent rdf:type :man >> .
>>>>> :loisLane :believes << :man rdfs:subClassOf :human >> .
>>>>>
>>>>> Does not imply
>>>>>
>>>>> :loisLane :believes << :clarkKent rdf:type :human >> .
>>>>>
>>>>>
>>>>>
>>>>> So embedded triples are incredibly weak in RDF*.   Making them useful will
>>>>> likely require quite a bit of work.
>>>>>
>>>>> Here, "useful" depends again on your intended use. We wanted to have a
>>>>> rather weak semantics which allows users with more complex use cases to add
>>>>> their semantics. It is easier to make the semantics more complex by adding
>>>>> extensions than to ignore certain parts. I for example remember that Jos De
>>>>> Roo announced some time ago that his EYE reasoner supports rules on RDF*. Of
>>>>> course that alone would not allow you to cover all cases, but it could be
>>>>> very helpful in practice.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On the other hand, there are some unusual inferences that can be made in
>>>>> RDF*.  In an RDF* version of RDFS++ it is possible to state that two triples
>>>>> are the same.   The graph
>>>>>
>>>>> :loisLane :believes << :superman :can :fly >>.
>>>>> << :superman :can :fly >> owl:sameAs << :clarkKent :can :fly >> .
>>>>>
>>>>> is consistent here and implies
>>>>>
>>>>> :superman owl:sameAs :clarkKent .
>>>>> :loisLane :believes << :clarkKent :can :fly >>.
>>>>>
>>>>> This last case is an interesting one. We indeed wanted the triple
>>>>>
>>>>> :loisLane :believes << :clarkKent :can :fly >>.
>>>>>
>>>>> to be a consequence of your statements. The question is whether
>>>>>
>>>>> :superman owl:sameAs :clarkKent .
>>>>>
>>>>> should follow (it does indeed follow, just as you describe). We made the
>>>>> semantics of embedded triples the way it is to be able to deal with blank
>>>>> notes. Here, I can't give a concrete answer whether (at least to my
>>>>> understanding) it should be that way. I will think about it (and read
>>>>> Pierre-Antoine's thoughts in the mean time, which just arrived as well) and
>>>>> come back to you.
>>>>>
>>>>> Kind regards,
>>>>> Doerthe
>>>>>
>>>>>
>>>>>
Received on Monday, 19 October 2020 23:06:02 UTC