Re: weakness of embedded triples from Holger Knublauch on 2020-10-21 (public-rdf-star@w3.org from October 2020)

From: Holger Knublauch <holger@topquadrant.com>
Date: Thu, 22 Oct 2020 07:26:29 +1000
To: public-rdf-star@w3.org
Message-ID: <0f66ad86-54e3-8b85-3c42-d4aab8f92c1d@topquadrant.com>
Is the ability to reify triples with blank nodes even a relevant 
requirement? It feels like a feature to support for symmetry reasons 
only, not real use cases.

Holger


On 21/10/2020 11:22 pm, Pierre-Antoine Champin wrote:
> On 21/10/2020 12:51, thomas lörtsch wrote:
>>> (...)
>>> So making skolemization the norm would be disruptive with respect to the existing semantics.
>> RDF 1.1 Concepts and Abstract Syntax, Section 3.5 [0] says:
>>
>> "Blank nodes do not have identifiers in the RDF abstract syntax. The blank node identifiers introduced by some concrete syntaxes have only local scope and are purely an artifact of the serialization.
>> In situations where stronger identification is needed, systems MAY systematically replace some or all of the blank nodes in an RDF graph with IRIs. [...] This transformation does not appreciably change the meaning of an RDF graph [...]. "
>>
>> Where is the disruption?
> We are talking replacing a MAY with a MUST. Granted, this is not a
> copernican revolution, but this is a change nonetheless. And the
> semantics we propose aims to align as closely as possible with the
> existing RDF 1.1 Semantics, so I don't consider it as "disruptive" either.
>
> But let's be more concrete, and get back to the original question by
> Martynas:
>
>> Does RDF* need new semantics at all?
> followed by a proposal to replace embedded triple with specifically
> coined IRIs. I argued that this would be a problem for embedded triples
> containing blank nodes, and this is where the Skolemization came out.
>
> Consider the following RDF graphs:
>
> G1:
>
>      :alice :knows _:x.
>      << _:x :name "Bob" >> :statedBy :alice.
>
> G2:
>
>      :alice :knows _:y.
>      << _:y :name "Bob" >> :statedBy :alice.
>
> I hope that we agree that they should be considered equivalent. However,
> if you skolemize them AND replace embedded triples with specific IRIs,
> you get something like:
>
> G3':
>
>      :alice :knows _sk:1234.
>      triple:sk:1234_:name_Bob :statedBy :alice.
>
> G4':
>
>      :alice :knows sk:5678.
>      triple:sk:5678_:name_Bob :statedBy :alice.
>
> Now, determining that these graphs are equivalent becomes non trivial.
> Not only do you need to find a mapping between Skolem IRIs (which are
> easy to spot), but you need to "propagate" this mapping inside triple:
> IRIs. Actually, Martynas was proposing to generate the triple: IRIs
> using a hash function, so that makes the propagation even harder, if not
> impossible.
>
> So yes, I believe we need a new semantics, because embedded triples have
> an internal structure that we need to take into account.
>
>    best
>
>> Thomas
>>
>>
>> [0] https://www.w3.org/TR/rdf11-concepts/
>>
>>> Also, note that the semantics' goal is not to prescribe a particular implementation method; it is to ensure that different implementations remain interoperable.
>>>    pa
>>>
>>>> On Mon, Oct 19, 2020 at 11:07 AM Pavel Klinov
>>>> <pavel@stardog.com>
>>>>   wrote:
>>>>
>>>>> Yeah right. We have a mechanism in place to avoid using the same Skolem constant for bnodes with the same lexical form occurring in multiple RDF datasets (eg. when loading multiple files) but that's pretty much it. IIRC it's called something like "standardising apart" in one of the RDF docs.
>>>>>
>>>>> Cheers,
>>>>> Pavel
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Oct 19, 2020 at 10:54 AM Pierre-Antoine Champin
>>>>> <pierre-antoine.champin@ercim.eu>
>>>>>   wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> Holger, Pavel: I assume that blank nodes are internally skolemized, so indeed, internally, you only have IRIs and literals. Correct?
>>>>>>
>>>>>> On 19/10/2020 10:28, Holger Knublauch wrote:
>>>>>>
>>>>>> Similar situation here at TopQuadrant, see
>>>>>>
>>>>>>
>>>>>> http://datashapes.org/reification.html#uriReification
>>>>>>
>>>>>>
>>>>>> Holger
>>>>>>
>>>>>>
>>>>>> On 10/19/2020 6:24 PM, Pavel Klinov wrote:
>>>>>>
>>>>>> This is roughly how Stardog supports RDF* and so far we find it sufficient in the enterprise context. It's pretty easily understood by users familiar with edge properties in the property graph data model, which is one of the most important factors for us.
>>>>>>
>>>>>> Cheers,
>>>>>> Pavel
>>>>>>
>>>>>> On Sat, Oct 17, 2020 at 9:54 PM Martynas Jusevičius
>>>>>> <martynas@atomgraph.com>
>>>>>>   wrote:
>>>>>>
>>>>>>> Does RDF* need new semantics at all? Couldn't it be a syntax-level
>>>>>>> convention for unique triple IDs?
>>>>>>>
>>>>>>> E.g. <<s>, <p>, <o>> being syntactic sugar for
>>>>>>> uri(concat("urn:rdf:id:", hash(str(<s>)), hash(str(<p>)),
>>>>>>> hash(str(<p>)))).
>>>>>>>
>>>>>>> For example, the triple
>>>>>>>
>>>>>>> <
>>>>>>> <https://www.w3.org/People/Berners-Lee/card>
>>>>>>> <http://xmlns.com/foaf/0.1/primaryTopic>
>>>>>>> <https://www.w3.org/People/Berners-Lee/card#i>
>>>>>>> gives
>>>>>>>
>>>>>>> URI(CONCAT("urn:rdf:id:",
>>>>>>> SHA1(STR(
>>>>>>> <https://www.w3.org/People/Berners-Lee/card>
>>>>>>> )),
>>>>>>> SHA1(STR(
>>>>>>> <http://xmlns.com/foaf/0.1/primaryTopic>
>>>>>>> )),
>>>>>>> SHA1(STR(
>>>>>>> <https://www.w3.org/People/Berners-Lee/card#i>
>>>>>>> ))))
>>>>>>>
>>>>>>> gives
>>>>>>>
>>>>>>> <urn:rdf:id:63874e34ff5f326e67e888f6818f72d5033ecb343cadd8c2120281d72cefce4481485c937b6a95a656beaa67c13db29f3d7be801328b7c9125976c5f>
>>>>>>>
>>>>>>> which essentially would become the "5th element", in addition to quads.
>>>>>>>
>>>>>>> On Thu, Oct 15, 2020 at 1:38 PM Pierre-Antoine Champin
>>>>>>>
>>>>>>> <pierre-antoine.champin@ercim.eu>
>>>>>>>   wrote:
>>>>>>>
>>>>>>>> On 14/10/2020 23:13, Peter F. Patel-Schneider wrote:
>>>>>>>>
>>>>>>>> Let's make the height example even more stark.
>>>>>>>>
>>>>>>>>
>>>>>>>> :loisLane :believes << :clarkKent :height "6.0"^^xsd:decimal >> .
>>>>>>>>
>>>>>>>>
>>>>>>>> does not imply
>>>>>>>>
>>>>>>>>
>>>>>>>> :loisLane :believes << :clarkKent :height "6.00"^^xsd:decimal >> .
>>>>>>>>
>>>>>>>>
>>>>>>>> I would hope that any Tom, Dick, and Lois can realize that these two literals
>>>>>>>> are the same.
>>>>>>>>
>>>>>>>> I see your point, but this is really a matter of deciding where you put the boundary...
>>>>>>>>
>>>>>>>> So I would still prefer to be radical here and consider any lexical difference as potentially significant.
>>>>>>>>
>>>>>>>> If you want to stick to literals that have to be supported in RDF
>>>>>>>>
>>>>>>>>
>>>>>>>> :loisLane :believes << :clarkKent :name "Clark"@en-US >> .
>>>>>>>>
>>>>>>>>
>>>>>>>> does not imply
>>>>>>>>
>>>>>>>>
>>>>>>>> :loisLane :believes << :clarkKent :name "Clark"@en-us >> .
>>>>>>>>
>>>>>>>> Are "Clark"@en-US and "Clark"@en-us really different literals, for the abstract syntax??
>>>>>>>>
>>>>>>>> I would have thought they are the same (and so the implication above would hold).
>>>>>>>>
>>>>>>>> Reading the spec again, I realize that things are not so clear: "Lexical representations of language tags MAY be converted to lower case", and then Literal term equality requires that language tags "compare equal, character by character". So these 2 literals MAY be considered equal, and the implication MAY hold... :-/ Add to this that BCP47 explicitly state that language tags are case insensitive... I'd say that we are in gray area here.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> peter
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/14/20 4:45 PM, Doerthe Arndt wrote:
>>>>>>>>
>>>>>>>> Dear Peter,
>>>>>>>>
>>>>>>>> you are right with both observations. The question is whether we want that
>>>>>>>> behavior or not.
>>>>>>>>
>>>>>>>> In
>>>>>>>> https://w3c.github.io/rdf-star/
>>>>>>>>   there is a section on referential opacity.
>>>>>>>> The main claim there is that triples are referentially opaque.
>>>>>>>>
>>>>>>>>
>>>>>>>> But embedded triples are much weaker than just being referntially opaque.  To
>>>>>>>> see this consider the following RDF* graph under the RDF* version of RDF
>>>>>>>> entailment recognizing xsd:decimal and xsd:integer.
>>>>>>>>
>>>>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:decimal >> .
>>>>>>>>
>>>>>>>> In this semantics "6"^^xsd:decimal means the same as "6"^^xsd:integer so one
>>>>>>>> would expect that
>>>>>>>>
>>>>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:integer >> .
>>>>>>>>
>>>>>>>> is RDF*-entailed.
>>>>>>>>
>>>>>>>> But it is not.  There are two reasons for this.
>>>>>>>>
>>>>>>>> First, there is no requirement that satisfying interpretations for the first
>>>>>>>> graph map < :clarkKent :height "6"^^xsd:integer > to anything and if a
>>>>>>>> satisfying interpretation does map the triple there is no requirement that its
>>>>>>>> ITEXT mapping gives the triple its correct meaning.  (The value of ITEXT for
>>>>>>>> the triple could have the real number pi as its third element.)
>>>>>>>>
>>>>>>>> Second, "6"^^xsd:integer is a different node from "6"^^xsd:decimal. So even if
>>>>>>>> the intepretation treats the second embedded triple nicely, and thus gives it
>>>>>>>> the same meaning as the first embedded triple, they are still two different
>>>>>>>> triples and :loisLane can believe one but not the other.  So very little of
>>>>>>>> the semantics of RDF gets into embedded triples.
>>>>>>>>
>>>>>>>> We wanted different that different representations are treated differently
>>>>>>>> if they have the same meaning. The reason for that is that we expected that
>>>>>>>> RDF* would also be used to make statements about triples as they were
>>>>>>>> stated, for example to be able to explain the reasoning performed on the
>>>>>>>> triples but also for simple provenance. In these cases there should be a
>>>>>>>> difference between
>>>>>>>>
>>>>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:decimal >> .
>>>>>>>>
>>>>>>>> and
>>>>>>>>
>>>>>>>> :loisLane :believes << :clarkKent :height "6"^^xsd:integer >>
>>>>>>>>
>>>>>>>> since we still talk about different representations.
>>>>>>>>
>>>>>>>> Each triple is, in effect, its own context.  So, in an RDFS version of RDF*,
>>>>>>>> even if :loisLane believes several triples that should imply another, they
>>>>>>>> generally don't.  For example:
>>>>>>>>
>>>>>>>> :loisLane :believes << :clarkKent rdf:type :man >> .
>>>>>>>> :loisLane :believes << :man rdfs:subClassOf :human >> .
>>>>>>>>
>>>>>>>> Does not imply
>>>>>>>>
>>>>>>>> :loisLane :believes << :clarkKent rdf:type :human >> .
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> So embedded triples are incredibly weak in RDF*.   Making them useful will
>>>>>>>> likely require quite a bit of work.
>>>>>>>>
>>>>>>>> Here, "useful" depends again on your intended use. We wanted to have a
>>>>>>>> rather weak semantics which allows users with more complex use cases to add
>>>>>>>> their semantics. It is easier to make the semantics more complex by adding
>>>>>>>> extensions than to ignore certain parts. I for example remember that Jos De
>>>>>>>> Roo announced some time ago that his EYE reasoner supports rules on RDF*. Of
>>>>>>>> course that alone would not allow you to cover all cases, but it could be
>>>>>>>> very helpful in practice.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On the other hand, there are some unusual inferences that can be made in
>>>>>>>> RDF*.  In an RDF* version of RDFS++ it is possible to state that two triples
>>>>>>>> are the same.   The graph
>>>>>>>>
>>>>>>>> :loisLane :believes << :superman :can :fly >>.
>>>>>>>> << :superman :can :fly >> owl:sameAs << :clarkKent :can :fly >> .
>>>>>>>>
>>>>>>>> is consistent here and implies
>>>>>>>>
>>>>>>>> :superman owl:sameAs :clarkKent .
>>>>>>>> :loisLane :believes << :clarkKent :can :fly >>.
>>>>>>>>
>>>>>>>> This last case is an interesting one. We indeed wanted the triple
>>>>>>>>
>>>>>>>> :loisLane :believes << :clarkKent :can :fly >>.
>>>>>>>>
>>>>>>>> to be a consequence of your statements. The question is whether
>>>>>>>>
>>>>>>>> :superman owl:sameAs :clarkKent .
>>>>>>>>
>>>>>>>> should follow (it does indeed follow, just as you describe). We made the
>>>>>>>> semantics of embedded triples the way it is to be able to deal with blank
>>>>>>>> notes. Here, I can't give a concrete answer whether (at least to my
>>>>>>>> understanding) it should be that way. I will think about it (and read
>>>>>>>> Pierre-Antoine's thoughts in the mean time, which just arrived as well) and
>>>>>>>> come back to you.
>>>>>>>>
>>>>>>>> Kind regards,
>>>>>>>> Doerthe
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
Received on Wednesday, 21 October 2020 21:26:47 UTC