Re: weakness of embedded triples from Holger Knublauch on 2020-10-27 (public-rdf-star@w3.org from October 2020)

From: Holger Knublauch <holger@topquadrant.com>
Date: Wed, 28 Oct 2020 08:31:24 +1000
To: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>, public-rdf-star@w3.org
Message-ID: <6b0c3658-140e-45d2-8450-b22c184651a9@topquadrant.com>
On 10/28/2020 1:53 AM, Pierre-Antoine Champin wrote:
> Holger,
>
> Now I'm confused. This thread (which should have been renamed a long
> time ago) is, in my understanding, about Martynas' question raised here
>
> <https://www.w3.org/mid/CAE35Vmy3vbThwHnKjbhMQuwKkH0BhNoxr_Gp15Ri5LfOdedsSA@mail.gmail.com>
>
>> Does RDF* need new semantics at all?
> While I believe the answer is "yes", I concede that answering "no" to
> that question would be convenient, because it would mean that existing
> implementations of RDF could handle RDF* at the syntactical level only,
> i.e. parse Turtle* and store it standard RDF triples.
>
> In your examples below, however, you propose to extend existing
> implementations -- which defeats the purpose of fitting RDF* into
> standard RDF semantics...

The current RDF* draft requires introducing a 4th term type "RDF* triples":

 > IRIs <https://www.w3.org/TR/rdf11-concepts/#dfn-iri>,literals 
<https://www.w3.org/TR/rdf11-concepts/#dfn-literal>,blank nodes 
<https://www.w3.org/TR/rdf11-concepts/#dfn-blank-node>andRDF* triples 
<https://w3c.github.io/rdf-star/#dfn-triple>are collectively known 
asRDF* terms.

The approaches based on (long) URIs avoid this and therefore are likely 
much less concerning w.r.t. existing implementations. A syntactic 
mapping means that existing APIs can represent these triples as normal 
URI nodes. From our own experience moving to this design was not 
disruptive (although some users have raised concerns about exposing the 
ugly long URIs in unexpected places such as exporting them to plain Turtle).

Having said this, *some* implementations may represent these triple 
nodes differently, e.g. using the internal data structure I outlined 
below. This avoids ever storing these long URIs, makes it arguably 
easier to index and search over them, and probably keeps the issues with 
changing bnode identifiers at bay. But this is an implementation detail 
to me.

The additional semantics of interpreting these special URI nodes 
differently would be local to the RDF* specs and would not require 
adaptations to existing specs.

Holger


>
>    pa
>
>
> On 24/10/2020 03:26, Holger Knublauch wrote:
>> On 10/23/2020 9:58 PM, Pierre-Antoine Champin wrote:
>>> Hoger,
>>>
>>> On 23/10/2020 00:48, Holger Knublauch wrote:
>>>>> (...)
>>>>> That's why we need to have an extended semantics for RDF*.
>>>> Not necessarily. I still think this can be solved by simply declaring
>>>> reification on bnode triples to be unsupported.
>>>>
>>>> Yes there are theoretically some scenarios where this might be useful,
>>>> but I'd rather say "if you want to use RDF*, use IRIs and no bnodes"
>>>> than having to extend the very core model of RDF just for this corner
>>>> case.
>>> Whether this is a corner case or not remains to be discussed, IMO.
>>>
>>> Would you mind creating an issue on github with this proposal; we can
>>> have a quick +1 / -1 poll on that issue, and see what the community
>>> thinks...
>>>
>>>> There are already similar constraints in place in the RDF world, e.g.
>>>> a reified statement cannot appear as predicate, and literals cannot be
>>>> subjects. Life goes on, people get used to these limitations. Relying
>>>> on bnodes for identification is a bad practice anyway,
>>> I don't think there is a consensus about that either. As you know, there
>>> has been a resurgence of this permathread last summer, and  Brickley
>>> made, I think, a very good point in the defense of bnodes.
>>>
>>> https://lists.w3.org/Archives/Public/semantic-web/2020Jul/0003.html
>>>
>>> In fact, I could return your argument above : some people (me included)
>>> would like literals as subjects, but they are not allowed; life goes on.
>>> Some people do not like bnodes, but they are part of RDF; life goes on.
>>>
>>> But, granted, that does not mean that we need to "propagate" them into
>>> new RDF features if we collectively decide that we don't need to. Hence
>>> my proposal above to create a specific issue about that.
>> I am not sure I would want to go this far. My point that bnodes could
>> be declared off-limits was mainly in response to the technical issues
>> with generating long URIs. However, those long URIs do not even need
>> require this limitation. From a programming perspective, most APIs
>> have some kind of function such as Node.getURI() returning a string. I
>> believe a decent implementation strategy for RDF* that would not
>> require introducing a new node type (for triples) would be to simply
>> have those APIs implement a different subclass of Nodes. For example,
>> assume a Java class in some imaginary RDF API such as
>>
>> class URINode extends Node {
>>      private String uri;
>>
>>      public String getURI() {
>>          return uri;
>>      }
>>
>>      public boolean isURI() {
>>          return true;
>>      }
>> }
>>
>> class TripleNode extends Node {
>>      private Node subject;
>>      private Node predicate;
>>      private Node object;
>>
>>      public String getURI() {
>>          // Don't store URI string but compute it when needed (which is
>> probably not often the case)
>>          // This string could of course also be cached if it's a
>> performance problem
>>          return "urn:triple:" + encode(subject) + ":" +
>> encode(predicate) + ":" + encode(object);
>>      }
>>
>>      public boolean isURI() {
>>          return true;
>>      }
>>      ...
>> }
>>
>> An RDF* parser could produce these nodes from the input document.
>> Then, if the file is reloaded, the bnode IDs might differ yet the
>> reference to them remain identical.
>>
>> Wouldn't this give the best of both worlds? This doesn't require
>> changes to the low levels of RDF and leaves implementations the same
>> room for internal optimizations, e.g. an internal index to quickly
>> find all TripleNodes with certain subject, predicate or object.
>>
>> Holger
>>
>>
>>>     best
>>>
>>>> and they already don't work across graph boundaries.
>>>>
>>>> Holger
>>>>
>>>>
>>>>> (more precisely: that's why the trick of encoding embedded triples
>>>>> into
>>>>> IRIs does not work. There might be a smarter encoding of RDF* into
>>>>> RDF,
>>>>> which would allow us to rely on the standard semantics, but I
>>>>> seriously
>>>>> doubt it)
>>>>>
>>>>>> Thomas
>>>>>>
>>>>>>
>>>>>> [0] Aidan Hogan, 2017, Canonical Forms for Isomorphic and Equivalent
>>>>>> RDF Graphs: Algorithms for Leaning and Labelling Blank Nodes
>>>>>>
>>>>>>
>>>>>>>      best
>>>>>>>
>>>>>>>> Thomas
>>>>>>>>
>>>>>>>>
>>>>>>>> [0] https://www.w3.org/TR/rdf11-concepts/
>>>>>>>>
>>>>>>>>> Also, note that the semantics' goal is not to prescribe a
>>>>>>>>> particular implementation method; it is to ensure that different
>>>>>>>>> implementations remain interoperable.
>>>>>>>>>     pa
>>>>>>>>>
>>>>>>>>>> On Mon, Oct 19, 2020 at 11:07 AM Pavel Klinov
>>>>>>>>>> <pavel@stardog.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Yeah right. We have a mechanism in place to avoid using the
>>>>>>>>>>> same Skolem constant for bnodes with the same lexical form
>>>>>>>>>>> occurring in multiple RDF datasets (eg. when loading multiple
>>>>>>>>>>> files) but that's pretty much it. IIRC it's called something
>>>>>>>>>>> like "standardising apart" in one of the RDF docs.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Pavel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 19, 2020 at 10:54 AM Pierre-Antoine Champin
>>>>>>>>>>> <pierre-antoine.champin@ercim.eu>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>
>>>>>>>>>>>> Holger, Pavel: I assume that blank nodes are internally
>>>>>>>>>>>> skolemized, so indeed, internally, you only have IRIs and
>>>>>>>>>>>> literals. Correct?
>>>>>>>>>>>>
>>>>>>>>>>>> On 19/10/2020 10:28, Holger Knublauch wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Similar situation here at TopQuadrant, see
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> http://datashapes.org/reification.html#uriReification
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Holger
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/19/2020 6:24 PM, Pavel Klinov wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> This is roughly how Stardog supports RDF* and so far we find
>>>>>>>>>>>> it sufficient in the enterprise context. It's pretty easily
>>>>>>>>>>>> understood by users familiar with edge properties in the
>>>>>>>>>>>> property graph data model, which is one of the most important
>>>>>>>>>>>> factors for us.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Pavel
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Oct 17, 2020 at 9:54 PM Martynas Jusevičius
>>>>>>>>>>>> <martynas@atomgraph.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Does RDF* need new semantics at all? Couldn't it be a
>>>>>>>>>>>>> syntax-level
>>>>>>>>>>>>> convention for unique triple IDs?
>>>>>>>>>>>>>
>>>>>>>>>>>>> E.g. <<s>, <p>, <o>> being syntactic sugar for
>>>>>>>>>>>>> uri(concat("urn:rdf:id:", hash(str(<s>)), hash(str(<p>)),
>>>>>>>>>>>>> hash(str(<p>)))).
>>>>>>>>>>>>>
>>>>>>>>>>>>> For example, the triple
>>>>>>>>>>>>>
>>>>>>>>>>>>> <
>>>>>>>>>>>>> <https://www.w3.org/People/Berners-Lee/card>
>>>>>>>>>>>>> <http://xmlns.com/foaf/0.1/primaryTopic>
>>>>>>>>>>>>> <https://www.w3.org/People/Berners-Lee/card#i>
>>>>>>>>>>>>> gives
>>>>>>>>>>>>>
>>>>>>>>>>>>> URI(CONCAT("urn:rdf:id:",
>>>>>>>>>>>>> SHA1(STR(
>>>>>>>>>>>>> <https://www.w3.org/People/Berners-Lee/card>
>>>>>>>>>>>>> )),
>>>>>>>>>>>>> SHA1(STR(
>>>>>>>>>>>>> <http://xmlns.com/foaf/0.1/primaryTopic>
>>>>>>>>>>>>> )),
>>>>>>>>>>>>> SHA1(STR(
>>>>>>>>>>>>> <https://www.w3.org/People/Berners-Lee/card#i>
>>>>>>>>>>>>> ))))
>>>>>>>>>>>>>
>>>>>>>>>>>>> gives
>>>>>>>>>>>>>
>>>>>>>>>>>>> <urn:rdf:id:63874e34ff5f326e67e888f6818f72d5033ecb343cadd8c2120281d72cefce4481485c937b6a95a656beaa67c13db29f3d7be801328b7c9125976c5f>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> which essentially would become the "5th element", in addition
>>>>>>>>>>>>> to quads.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Oct 15, 2020 at 1:38 PM Pierre-Antoine Champin
>>>>>>>>>>>>>
>>>>>>>>>>>>> <pierre-antoine.champin@ercim.eu>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 14/10/2020 23:13, Peter F. Patel-Schneider wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Let's make the height example even more stark.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> :loisLane :believes << :clarkKent :height "6.0"^^xsd:decimal
>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>> does not imply
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> :loisLane :believes << :clarkKent :height
>>>>>>>>>>>>>> "6.00"^^xsd:decimal >> .
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I would hope that any Tom, Dick, and Lois can realize that
>>>>>>>>>>>>>> these two literals
>>>>>>>>>>>>>> are the same.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I see your point, but this is really a matter of deciding
>>>>>>>>>>>>>> where you put the boundary...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So I would still prefer to be radical here and consider any
>>>>>>>>>>>>>> lexical difference as potentially significant.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If you want to stick to literals that have to be supported
>>>>>>>>>>>>>> in RDF
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> :loisLane :believes << :clarkKent :name "Clark"@en-US >> .
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> does not imply
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> :loisLane :believes << :clarkKent :name "Clark"@en-us >> .
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Are "Clark"@en-US and "Clark"@en-us really different
>>>>>>>>>>>>>> literals, for the abstract syntax??
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I would have thought they are the same (and so the
>>>>>>>>>>>>>> implication above would hold).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Reading the spec again, I realize that things are not so
>>>>>>>>>>>>>> clear: "Lexical representations of language tags MAY be
>>>>>>>>>>>>>> converted to lower case", and then Literal term equality
>>>>>>>>>>>>>> requires that language tags "compare equal, character by
>>>>>>>>>>>>>> character". So these 2 literals MAY be considered equal, and
>>>>>>>>>>>>>> the implication MAY hold... :-/ Add to this that BCP47
>>>>>>>>>>>>>> explicitly state that language tags are case insensitive...
>>>>>>>>>>>>>> I'd say that we are in gray area here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> peter
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 10/14/20 4:45 PM, Doerthe Arndt wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dear Peter,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> you are right with both observations. The question is
>>>>>>>>>>>>>> whether we want that
>>>>>>>>>>>>>> behavior or not.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In
>>>>>>>>>>>>>> https://w3c.github.io/rdf-star/
>>>>>>>>>>>>>> there is a section on referential opacity.
>>>>>>>>>>>>>> The main claim there is that triples are referentially
>>>>>>>>>>>>>> opaque.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But embedded triples are much weaker than just being
>>>>>>>>>>>>>> referntially opaque.  To
>>>>>>>>>>>>>> see this consider the following RDF* graph under the RDF*
>>>>>>>>>>>>>> version of RDF
>>>>>>>>>>>>>> entailment recognizing xsd:decimal and xsd:integer.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:decimal
>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>> In this semantics "6"^^xsd:decimal means the same as
>>>>>>>>>>>>>> "6"^^xsd:integer so one
>>>>>>>>>>>>>> would expect that
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:integer
>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>> is RDF*-entailed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But it is not.  There are two reasons for this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> First, there is no requirement that satisfying
>>>>>>>>>>>>>> interpretations for the first
>>>>>>>>>>>>>> graph map < :clarkKent :height "6"^^xsd:integer > to
>>>>>>>>>>>>>> anything and if a
>>>>>>>>>>>>>> satisfying interpretation does map the triple there is no
>>>>>>>>>>>>>> requirement that its
>>>>>>>>>>>>>> ITEXT mapping gives the triple its correct meaning.  (The
>>>>>>>>>>>>>> value of ITEXT for
>>>>>>>>>>>>>> the triple could have the real number pi as its third
>>>>>>>>>>>>>> element.)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Second, "6"^^xsd:integer is a different node from
>>>>>>>>>>>>>> "6"^^xsd:decimal. So even if
>>>>>>>>>>>>>> the intepretation treats the second embedded triple nicely,
>>>>>>>>>>>>>> and thus gives it
>>>>>>>>>>>>>> the same meaning as the first embedded triple, they are
>>>>>>>>>>>>>> still two different
>>>>>>>>>>>>>> triples and :loisLane can believe one but not the other.  So
>>>>>>>>>>>>>> very little of
>>>>>>>>>>>>>> the semantics of RDF gets into embedded triples.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We wanted different that different representations are
>>>>>>>>>>>>>> treated differently
>>>>>>>>>>>>>> if they have the same meaning. The reason for that is that
>>>>>>>>>>>>>> we expected that
>>>>>>>>>>>>>> RDF* would also be used to make statements about triples as
>>>>>>>>>>>>>> they were
>>>>>>>>>>>>>> stated, for example to be able to explain the reasoning
>>>>>>>>>>>>>> performed on the
>>>>>>>>>>>>>> triples but also for simple provenance. In these cases there
>>>>>>>>>>>>>> should be a
>>>>>>>>>>>>>> difference between
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:decimal
>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:integer >>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> since we still talk about different representations.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Each triple is, in effect, its own context.  So, in an RDFS
>>>>>>>>>>>>>> version of RDF*,
>>>>>>>>>>>>>> even if :loisLane believes several triples that should imply
>>>>>>>>>>>>>> another, they
>>>>>>>>>>>>>> generally don't.  For example:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> :loisLane :believes << :clarkKent rdf:type :man >> .
>>>>>>>>>>>>>> :loisLane :believes << :man rdfs:subClassOf :human >> .
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does not imply
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> :loisLane :believes << :clarkKent rdf:type :human >> .
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So embedded triples are incredibly weak in RDF*.   Making
>>>>>>>>>>>>>> them useful will
>>>>>>>>>>>>>> likely require quite a bit of work.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here, "useful" depends again on your intended use. We wanted
>>>>>>>>>>>>>> to have a
>>>>>>>>>>>>>> rather weak semantics which allows users with more complex
>>>>>>>>>>>>>> use cases to add
>>>>>>>>>>>>>> their semantics. It is easier to make the semantics more
>>>>>>>>>>>>>> complex by adding
>>>>>>>>>>>>>> extensions than to ignore certain parts. I for example
>>>>>>>>>>>>>> remember that Jos De
>>>>>>>>>>>>>> Roo announced some time ago that his EYE reasoner supports
>>>>>>>>>>>>>> rules on RDF*. Of
>>>>>>>>>>>>>> course that alone would not allow you to cover all cases,
>>>>>>>>>>>>>> but it could be
>>>>>>>>>>>>>> very helpful in practice.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On the other hand, there are some unusual inferences that
>>>>>>>>>>>>>> can be made in
>>>>>>>>>>>>>> RDF*.  In an RDF* version of RDFS++ it is possible to state
>>>>>>>>>>>>>> that two triples
>>>>>>>>>>>>>> are the same.   The graph
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> :loisLane :believes << :superman :can :fly >>.
>>>>>>>>>>>>>> << :superman :can :fly >> owl:sameAs << :clarkKent :can :fly
>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>> is consistent here and implies
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> :superman owl:sameAs :clarkKent .
>>>>>>>>>>>>>> :loisLane :believes << :clarkKent :can :fly >>.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This last case is an interesting one. We indeed wanted the
>>>>>>>>>>>>>> triple
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> :loisLane :believes << :clarkKent :can :fly >>.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> to be a consequence of your statements. The question is
>>>>>>>>>>>>>> whether
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> :superman owl:sameAs :clarkKent .
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> should follow (it does indeed follow, just as you describe).
>>>>>>>>>>>>>> We made the
>>>>>>>>>>>>>> semantics of embedded triples the way it is to be able to
>>>>>>>>>>>>>> deal with blank
>>>>>>>>>>>>>> notes. Here, I can't give a concrete answer whether (at
>>>>>>>>>>>>>> least to my
>>>>>>>>>>>>>> understanding) it should be that way. I will think about it
>>>>>>>>>>>>>> (and read
>>>>>>>>>>>>>> Pierre-Antoine's thoughts in the mean time, which just
>>>>>>>>>>>>>> arrived as well) and
>>>>>>>>>>>>>> come back to you.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>> Doerthe
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
Received on Tuesday, 27 October 2020 22:31:47 UTC