Re: weakness of embedded triples from Pierre-Antoine Champin on 2020-10-27 (public-rdf-star@w3.org from October 2020)

From: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>
Date: Tue, 27 Oct 2020 16:53:08 +0100
To: Holger Knublauch <holger@topquadrant.com>, public-rdf-star@w3.org
Message-ID: <b31a1e42-6cc9-548a-3a2e-7c8fec526b3d@ercim.eu>
Holger,

Now I'm confused. This thread (which should have been renamed a long
time ago) is, in my understanding, about Martynas' question raised here

<https://www.w3.org/mid/CAE35Vmy3vbThwHnKjbhMQuwKkH0BhNoxr_Gp15Ri5LfOdedsSA@mail.gmail.com>

> Does RDF* need new semantics at all?

While I believe the answer is "yes", I concede that answering "no" to
that question would be convenient, because it would mean that existing
implementations of RDF could handle RDF* at the syntactical level only,
i.e. parse Turtle* and store it standard RDF triples.

In your examples below, however, you propose to extend existing
implementations -- which defeats the purpose of fitting RDF* into
standard RDF semantics...

  pa


On 24/10/2020 03:26, Holger Knublauch wrote:
>
> On 10/23/2020 9:58 PM, Pierre-Antoine Champin wrote:
>> Hoger,
>>
>> On 23/10/2020 00:48, Holger Knublauch wrote:
>>>> (...)
>>>> That's why we need to have an extended semantics for RDF*.
>>> Not necessarily. I still think this can be solved by simply declaring
>>> reification on bnode triples to be unsupported.
>>>
>>> Yes there are theoretically some scenarios where this might be useful,
>>> but I'd rather say "if you want to use RDF*, use IRIs and no bnodes"
>>> than having to extend the very core model of RDF just for this corner
>>> case.
>> Whether this is a corner case or not remains to be discussed, IMO.
>>
>> Would you mind creating an issue on github with this proposal; we can
>> have a quick +1 / -1 poll on that issue, and see what the community
>> thinks...
>>
>>> There are already similar constraints in place in the RDF world, e.g.
>>> a reified statement cannot appear as predicate, and literals cannot be
>>> subjects. Life goes on, people get used to these limitations. Relying
>>> on bnodes for identification is a bad practice anyway,
>> I don't think there is a consensus about that either. As you know, there
>> has been a resurgence of this permathread last summer, and  Brickley
>> made, I think, a very good point in the defense of bnodes.
>>
>> https://lists.w3.org/Archives/Public/semantic-web/2020Jul/0003.html
>>
>> In fact, I could return your argument above : some people (me included)
>> would like literals as subjects, but they are not allowed; life goes on.
>> Some people do not like bnodes, but they are part of RDF; life goes on.
>>
>> But, granted, that does not mean that we need to "propagate" them into
>> new RDF features if we collectively decide that we don't need to. Hence
>> my proposal above to create a specific issue about that.
>
> I am not sure I would want to go this far. My point that bnodes could
> be declared off-limits was mainly in response to the technical issues
> with generating long URIs. However, those long URIs do not even need
> require this limitation. From a programming perspective, most APIs
> have some kind of function such as Node.getURI() returning a string. I
> believe a decent implementation strategy for RDF* that would not
> require introducing a new node type (for triples) would be to simply
> have those APIs implement a different subclass of Nodes. For example,
> assume a Java class in some imaginary RDF API such as
>
> class URINode extends Node {
>     private String uri;
>
>     public String getURI() {
>         return uri;
>     }
>
>     public boolean isURI() {
>         return true;
>     }
> }
>
> class TripleNode extends Node {
>     private Node subject;
>     private Node predicate;
>     private Node object;
>
>     public String getURI() {
>         // Don't store URI string but compute it when needed (which is
> probably not often the case)
>         // This string could of course also be cached if it's a
> performance problem
>         return "urn:triple:" + encode(subject) + ":" +
> encode(predicate) + ":" + encode(object);
>     }
>
>     public boolean isURI() {
>         return true;
>     }
>     ...
> }
>
> An RDF* parser could produce these nodes from the input document.
> Then, if the file is reloaded, the bnode IDs might differ yet the
> reference to them remain identical.
>
> Wouldn't this give the best of both worlds? This doesn't require
> changes to the low levels of RDF and leaves implementations the same
> room for internal optimizations, e.g. an internal index to quickly
> find all TripleNodes with certain subject, predicate or object.
>
> Holger
>
>
>>
>>    best
>>
>>> and they already don't work across graph boundaries.
>>>
>>> Holger
>>>
>>>
>>>> (more precisely: that's why the trick of encoding embedded triples
>>>> into
>>>> IRIs does not work. There might be a smarter encoding of RDF* into
>>>> RDF,
>>>> which would allow us to rely on the standard semantics, but I
>>>> seriously
>>>> doubt it)
>>>>
>>>>> Thomas
>>>>>
>>>>>
>>>>> [0] Aidan Hogan, 2017, Canonical Forms for Isomorphic and Equivalent
>>>>> RDF Graphs: Algorithms for Leaning and Labelling Blank Nodes
>>>>>
>>>>>
>>>>>>     best
>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>>
>>>>>>> [0] https://www.w3.org/TR/rdf11-concepts/
>>>>>>>
>>>>>>>> Also, note that the semantics' goal is not to prescribe a
>>>>>>>> particular implementation method; it is to ensure that different
>>>>>>>> implementations remain interoperable.
>>>>>>>>    pa
>>>>>>>>
>>>>>>>>> On Mon, Oct 19, 2020 at 11:07 AM Pavel Klinov
>>>>>>>>> <pavel@stardog.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Yeah right. We have a mechanism in place to avoid using the
>>>>>>>>>> same Skolem constant for bnodes with the same lexical form
>>>>>>>>>> occurring in multiple RDF datasets (eg. when loading multiple
>>>>>>>>>> files) but that's pretty much it. IIRC it's called something
>>>>>>>>>> like "standardising apart" in one of the RDF docs.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Pavel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 19, 2020 at 10:54 AM Pierre-Antoine Champin
>>>>>>>>>> <pierre-antoine.champin@ercim.eu>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Dear all,
>>>>>>>>>>>
>>>>>>>>>>> Holger, Pavel: I assume that blank nodes are internally
>>>>>>>>>>> skolemized, so indeed, internally, you only have IRIs and
>>>>>>>>>>> literals. Correct?
>>>>>>>>>>>
>>>>>>>>>>> On 19/10/2020 10:28, Holger Knublauch wrote:
>>>>>>>>>>>
>>>>>>>>>>> Similar situation here at TopQuadrant, see
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://datashapes.org/reification.html#uriReification
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Holger
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 10/19/2020 6:24 PM, Pavel Klinov wrote:
>>>>>>>>>>>
>>>>>>>>>>> This is roughly how Stardog supports RDF* and so far we find
>>>>>>>>>>> it sufficient in the enterprise context. It's pretty easily
>>>>>>>>>>> understood by users familiar with edge properties in the
>>>>>>>>>>> property graph data model, which is one of the most important
>>>>>>>>>>> factors for us.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Pavel
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Oct 17, 2020 at 9:54 PM Martynas Jusevičius
>>>>>>>>>>> <martynas@atomgraph.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Does RDF* need new semantics at all? Couldn't it be a
>>>>>>>>>>>> syntax-level
>>>>>>>>>>>> convention for unique triple IDs?
>>>>>>>>>>>>
>>>>>>>>>>>> E.g. <<s>, <p>, <o>> being syntactic sugar for
>>>>>>>>>>>> uri(concat("urn:rdf:id:", hash(str(<s>)), hash(str(<p>)),
>>>>>>>>>>>> hash(str(<p>)))).
>>>>>>>>>>>>
>>>>>>>>>>>> For example, the triple
>>>>>>>>>>>>
>>>>>>>>>>>> <
>>>>>>>>>>>> <https://www.w3.org/People/Berners-Lee/card>
>>>>>>>>>>>> <http://xmlns.com/foaf/0.1/primaryTopic>
>>>>>>>>>>>> <https://www.w3.org/People/Berners-Lee/card#i>
>>>>>>>>>>>> gives
>>>>>>>>>>>>
>>>>>>>>>>>> URI(CONCAT("urn:rdf:id:",
>>>>>>>>>>>> SHA1(STR(
>>>>>>>>>>>> <https://www.w3.org/People/Berners-Lee/card>
>>>>>>>>>>>> )),
>>>>>>>>>>>> SHA1(STR(
>>>>>>>>>>>> <http://xmlns.com/foaf/0.1/primaryTopic>
>>>>>>>>>>>> )),
>>>>>>>>>>>> SHA1(STR(
>>>>>>>>>>>> <https://www.w3.org/People/Berners-Lee/card#i>
>>>>>>>>>>>> ))))
>>>>>>>>>>>>
>>>>>>>>>>>> gives
>>>>>>>>>>>>
>>>>>>>>>>>> <urn:rdf:id:63874e34ff5f326e67e888f6818f72d5033ecb343cadd8c2120281d72cefce4481485c937b6a95a656beaa67c13db29f3d7be801328b7c9125976c5f>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> which essentially would become the "5th element", in addition
>>>>>>>>>>>> to quads.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Oct 15, 2020 at 1:38 PM Pierre-Antoine Champin
>>>>>>>>>>>>
>>>>>>>>>>>> <pierre-antoine.champin@ercim.eu>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 14/10/2020 23:13, Peter F. Patel-Schneider wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let's make the height example even more stark.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> :loisLane :believes << :clarkKent :height "6.0"^^xsd:decimal
>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>
>>>>>>>>>>>>> does not imply
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> :loisLane :believes << :clarkKent :height
>>>>>>>>>>>>> "6.00"^^xsd:decimal >> .
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would hope that any Tom, Dick, and Lois can realize that
>>>>>>>>>>>>> these two literals
>>>>>>>>>>>>> are the same.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I see your point, but this is really a matter of deciding
>>>>>>>>>>>>> where you put the boundary...
>>>>>>>>>>>>>
>>>>>>>>>>>>> So I would still prefer to be radical here and consider any
>>>>>>>>>>>>> lexical difference as potentially significant.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you want to stick to literals that have to be supported
>>>>>>>>>>>>> in RDF
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> :loisLane :believes << :clarkKent :name "Clark"@en-US >> .
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> does not imply
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> :loisLane :believes << :clarkKent :name "Clark"@en-us >> .
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are "Clark"@en-US and "Clark"@en-us really different
>>>>>>>>>>>>> literals, for the abstract syntax??
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would have thought they are the same (and so the
>>>>>>>>>>>>> implication above would hold).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Reading the spec again, I realize that things are not so
>>>>>>>>>>>>> clear: "Lexical representations of language tags MAY be
>>>>>>>>>>>>> converted to lower case", and then Literal term equality
>>>>>>>>>>>>> requires that language tags "compare equal, character by
>>>>>>>>>>>>> character". So these 2 literals MAY be considered equal, and
>>>>>>>>>>>>> the implication MAY hold... :-/ Add to this that BCP47
>>>>>>>>>>>>> explicitly state that language tags are case insensitive...
>>>>>>>>>>>>> I'd say that we are in gray area here.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> peter
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/14/20 4:45 PM, Doerthe Arndt wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dear Peter,
>>>>>>>>>>>>>
>>>>>>>>>>>>> you are right with both observations. The question is
>>>>>>>>>>>>> whether we want that
>>>>>>>>>>>>> behavior or not.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In
>>>>>>>>>>>>> https://w3c.github.io/rdf-star/
>>>>>>>>>>>>> there is a section on referential opacity.
>>>>>>>>>>>>> The main claim there is that triples are referentially
>>>>>>>>>>>>> opaque.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> But embedded triples are much weaker than just being
>>>>>>>>>>>>> referntially opaque.  To
>>>>>>>>>>>>> see this consider the following RDF* graph under the RDF*
>>>>>>>>>>>>> version of RDF
>>>>>>>>>>>>> entailment recognizing xsd:decimal and xsd:integer.
>>>>>>>>>>>>>
>>>>>>>>>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:decimal
>>>>>>>>>>>>> >> .
>>>>>>>>>>>>>
>>>>>>>>>>>>> In this semantics "6"^^xsd:decimal means the same as
>>>>>>>>>>>>> "6"^^xsd:integer so one
>>>>>>>>>>>>> would expect that
>>>>>>>>>>>>>
>>>>>>>>>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:integer
>>>>>>>>>>>>> >> .
>>>>>>>>>>>>>
>>>>>>>>>>>>> is RDF*-entailed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But it is not.  There are two reasons for this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> First, there is no requirement that satisfying
>>>>>>>>>>>>> interpretations for the first
>>>>>>>>>>>>> graph map < :clarkKent :height "6"^^xsd:integer > to
>>>>>>>>>>>>> anything and if a
>>>>>>>>>>>>> satisfying interpretation does map the triple there is no
>>>>>>>>>>>>> requirement that its
>>>>>>>>>>>>> ITEXT mapping gives the triple its correct meaning.  (The
>>>>>>>>>>>>> value of ITEXT for
>>>>>>>>>>>>> the triple could have the real number pi as its third
>>>>>>>>>>>>> element.)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Second, "6"^^xsd:integer is a different node from
>>>>>>>>>>>>> "6"^^xsd:decimal. So even if
>>>>>>>>>>>>> the intepretation treats the second embedded triple nicely,
>>>>>>>>>>>>> and thus gives it
>>>>>>>>>>>>> the same meaning as the first embedded triple, they are
>>>>>>>>>>>>> still two different
>>>>>>>>>>>>> triples and :loisLane can believe one but not the other.  So
>>>>>>>>>>>>> very little of
>>>>>>>>>>>>> the semantics of RDF gets into embedded triples.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We wanted different that different representations are
>>>>>>>>>>>>> treated differently
>>>>>>>>>>>>> if they have the same meaning. The reason for that is that
>>>>>>>>>>>>> we expected that
>>>>>>>>>>>>> RDF* would also be used to make statements about triples as
>>>>>>>>>>>>> they were
>>>>>>>>>>>>> stated, for example to be able to explain the reasoning
>>>>>>>>>>>>> performed on the
>>>>>>>>>>>>> triples but also for simple provenance. In these cases there
>>>>>>>>>>>>> should be a
>>>>>>>>>>>>> difference between
>>>>>>>>>>>>>
>>>>>>>>>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:decimal
>>>>>>>>>>>>> >> .
>>>>>>>>>>>>>
>>>>>>>>>>>>> and
>>>>>>>>>>>>>
>>>>>>>>>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:integer >>
>>>>>>>>>>>>>
>>>>>>>>>>>>> since we still talk about different representations.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Each triple is, in effect, its own context.  So, in an RDFS
>>>>>>>>>>>>> version of RDF*,
>>>>>>>>>>>>> even if :loisLane believes several triples that should imply
>>>>>>>>>>>>> another, they
>>>>>>>>>>>>> generally don't.  For example:
>>>>>>>>>>>>>
>>>>>>>>>>>>> :loisLane :believes << :clarkKent rdf:type :man >> .
>>>>>>>>>>>>> :loisLane :believes << :man rdfs:subClassOf :human >> .
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does not imply
>>>>>>>>>>>>>
>>>>>>>>>>>>> :loisLane :believes << :clarkKent rdf:type :human >> .
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> So embedded triples are incredibly weak in RDF*.   Making
>>>>>>>>>>>>> them useful will
>>>>>>>>>>>>> likely require quite a bit of work.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here, "useful" depends again on your intended use. We wanted
>>>>>>>>>>>>> to have a
>>>>>>>>>>>>> rather weak semantics which allows users with more complex
>>>>>>>>>>>>> use cases to add
>>>>>>>>>>>>> their semantics. It is easier to make the semantics more
>>>>>>>>>>>>> complex by adding
>>>>>>>>>>>>> extensions than to ignore certain parts. I for example
>>>>>>>>>>>>> remember that Jos De
>>>>>>>>>>>>> Roo announced some time ago that his EYE reasoner supports
>>>>>>>>>>>>> rules on RDF*. Of
>>>>>>>>>>>>> course that alone would not allow you to cover all cases,
>>>>>>>>>>>>> but it could be
>>>>>>>>>>>>> very helpful in practice.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On the other hand, there are some unusual inferences that
>>>>>>>>>>>>> can be made in
>>>>>>>>>>>>> RDF*.  In an RDF* version of RDFS++ it is possible to state
>>>>>>>>>>>>> that two triples
>>>>>>>>>>>>> are the same.   The graph
>>>>>>>>>>>>>
>>>>>>>>>>>>> :loisLane :believes << :superman :can :fly >>.
>>>>>>>>>>>>> << :superman :can :fly >> owl:sameAs << :clarkKent :can :fly
>>>>>>>>>>>>>>> .
>>>>>>>>>>>>> is consistent here and implies
>>>>>>>>>>>>>
>>>>>>>>>>>>> :superman owl:sameAs :clarkKent .
>>>>>>>>>>>>> :loisLane :believes << :clarkKent :can :fly >>.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This last case is an interesting one. We indeed wanted the
>>>>>>>>>>>>> triple
>>>>>>>>>>>>>
>>>>>>>>>>>>> :loisLane :believes << :clarkKent :can :fly >>.
>>>>>>>>>>>>>
>>>>>>>>>>>>> to be a consequence of your statements. The question is
>>>>>>>>>>>>> whether
>>>>>>>>>>>>>
>>>>>>>>>>>>> :superman owl:sameAs :clarkKent .
>>>>>>>>>>>>>
>>>>>>>>>>>>> should follow (it does indeed follow, just as you describe).
>>>>>>>>>>>>> We made the
>>>>>>>>>>>>> semantics of embedded triples the way it is to be able to
>>>>>>>>>>>>> deal with blank
>>>>>>>>>>>>> notes. Here, I can't give a concrete answer whether (at
>>>>>>>>>>>>> least to my
>>>>>>>>>>>>> understanding) it should be that way. I will think about it
>>>>>>>>>>>>> (and read
>>>>>>>>>>>>> Pierre-Antoine's thoughts in the mean time, which just
>>>>>>>>>>>>> arrived as well) and
>>>>>>>>>>>>> come back to you.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>> Doerthe
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>
Received on Tuesday, 27 October 2020 15:53:16 UTC