Re: weakness of embedded triples from Pierre-Antoine Champin on 2020-10-27 (public-rdf-star@w3.org from October 2020)

From: Pierre-Antoine Champin <pierre-antoine.champin@ercim.eu>
Date: Tue, 27 Oct 2020 17:07:15 +0100
To: Pavel Klinov <pavel@stardog.com>, Holger Knublauch <holger@topquadrant.com>
Cc: public-rdf-star@w3.org
Message-ID: <d454fed2-9629-48e8-9279-e5352927b3f1@ercim.eu>
On 24/10/2020 09:20, Pavel Klinov wrote:
>
> On Fri, Oct 23, 2020 at 12:49 AM Holger Knublauch
> <holger@topquadrant.com <mailto:holger@topquadrant.com>> wrote:
>  
>
>     [snip]
>
>  
>
>     >>   and I’d like to have that. However I seem to have a
>     fundamental problem understanding the semantic problem (nobody is
>     surprised here). Taking a statement that is asserted and annotated
>     in RDF*, in a currently en vogue syntax:
>     >>
>     >>     :a  :b  :c  [[ :d  :e ]] .
>     >>
>     >> What internal structure are you refering to that has to be
>     taken into account? The statement is stated, and annotated, and
>     IIUC that’s about it. Ensuring that the annotation refers to this
>     specific statement is a syntactic problem and a statement
>     identifier composed of subject, predicate and object and encoded
>     in an IRI is a syntactic solution to that problem. Paraphrasing
>     your example above:
>     >>
>     >>     :a  :b  :c .
>     >>     :triple:a:b:c  :d  :e .
>     >>
>     >> What "internal structure" has changed here? In what ways could
>     this syntax convey a different meaning than the one above?
>     > Again, that is fine when no blank nodes are involved. But if you
>     replace
>     > :c with _:x, you would get something like:
>     >
>     >      :a :b _:x.
>     >      :triple:a:b:_:x :d :e.
>     >
>     > Then, RDF semantics says you can replace _:x with _:y without
>     changing
>     > the meaning of the graph. This renaming only impacts the first
>     triple;
>     > from the point of view of standard RDF semantics, the second triple
>     > contains 3 IRIs, so it is kept unchanged by the renaming. So under
>     > standard RDF semantics, we can replace the graph above with:
>     >
>     >      :a :b _:y.
>     >      :triple:a:b:_:x :d :e.
>     >
>     > and we have lost the connection between the asserted triple and the
>     > annotated triple. That's why we need to have an extended
>     semantics for RDF*.
>
>     Not necessarily. I still think this can be solved by simply declaring
>     reification on bnode triples to be unsupported.
>
>     Yes there are theoretically some scenarios where this might be
>     useful,
>     but I'd rather say "if you want to use RDF*, use IRIs and no bnodes"
>     than having to extend the very core model of RDF just for this corner
>     case. There are already similar constraints in place in the RDF
>     world,
>     e.g. a reified statement cannot appear as predicate, and literals
>     cannot
>     be subjects. Life goes on, people get used to these limitations.
>     Relying
>     on bnodes for identification is a bad practice anyway, and they
>     already
>     don't work across graph boundaries.
>
>
> I imagine it's going to be difficult to agree on a single approach
> here which would make everyone happy

"happy", probably not. A good negotiation is one that makes everyone
equally dissatisfied ;-)

Jokes apart, I really hope that we can reach some form of consensus.
This whole effort is to prevent various implementations of RDF* to
diverge into an archipelago of non-interoperable systems...

> but it's also important to let vendors support a useful subset of
> RDF*/SPARQL* without bothering their users/customers too much about
> bnodes. Maybe we can figure out some subsets of RDF*, e.g. akin to the
> OWL 2 Profiles,

The concept of "profiles" may be appealing, but I would be concerned
again about interoperability...

> s.t. the simple RDF* interpretation along the lines that Martynas
> described would be compatible with a more complex and comprehensive
> one as long as bnodes do not appear in annotated triples (as Holger
> suggested).

I would find it odd to forbid something like << _:x :p :o >> in Turtle*,
while << ?x :p :o >> would still be allowed in SPARQL*...

>  We at Stardog would then support the simple thing and possibly
> consider the complex one based on demand (this is generally how we
> tend to do things).
>
> Cheers,
> Pavel
>  
>
>
>     Holger
>
>
>     >
>     > (more precisely: that's why the trick of encoding embedded
>     triples into
>     > IRIs does not work. There might be a smarter encoding of RDF*
>     into RDF,
>     > which would allow us to rely on the standard semantics, but I
>     seriously
>     > doubt it)
>     >
>     >> Thomas
>     >>
>     >>
>     >> [0] Aidan Hogan, 2017, Canonical Forms for Isomorphic and
>     Equivalent RDF Graphs: Algorithms for Leaning and Labelling Blank
>     Nodes
>     >>
>     >>
>     >>>    best
>     >>>
>     >>>> Thomas
>     >>>>
>     >>>>
>     >>>> [0] https://www.w3.org/TR/rdf11-concepts/
>     >>>>
>     >>>>> Also, note that the semantics' goal is not to prescribe a
>     particular implementation method; it is to ensure that different
>     implementations remain interoperable.
>     >>>>>   pa
>     >>>>>
>     >>>>>> On Mon, Oct 19, 2020 at 11:07 AM Pavel Klinov
>     >>>>>> <pavel@stardog.com <mailto:pavel@stardog.com>>
>     >>>>>> wrote:
>     >>>>>>
>     >>>>>>> Yeah right. We have a mechanism in place to avoid using
>     the same Skolem constant for bnodes with the same lexical form
>     occurring in multiple RDF datasets (eg. when loading multiple
>     files) but that's pretty much it. IIRC it's called something like
>     "standardising apart" in one of the RDF docs.
>     >>>>>>>
>     >>>>>>> Cheers,
>     >>>>>>> Pavel
>     >>>>>>>
>     >>>>>>>
>     >>>>>>>
>     >>>>>>> On Mon, Oct 19, 2020 at 10:54 AM Pierre-Antoine Champin
>     >>>>>>> <pierre-antoine.champin@ercim.eu
>     <mailto:pierre-antoine.champin@ercim.eu>>
>     >>>>>>> wrote:
>     >>>>>>>
>     >>>>>>>> Dear all,
>     >>>>>>>>
>     >>>>>>>> Holger, Pavel: I assume that blank nodes are internally
>     skolemized, so indeed, internally, you only have IRIs and
>     literals. Correct?
>     >>>>>>>>
>     >>>>>>>> On 19/10/2020 10:28, Holger Knublauch wrote:
>     >>>>>>>>
>     >>>>>>>> Similar situation here at TopQuadrant, see
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>> http://datashapes.org/reification.html#uriReification
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>> Holger
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>> On 10/19/2020 6:24 PM, Pavel Klinov wrote:
>     >>>>>>>>
>     >>>>>>>> This is roughly how Stardog supports RDF* and so far we
>     find it sufficient in the enterprise context. It's pretty easily
>     understood by users familiar with edge properties in the property
>     graph data model, which is one of the most important factors for us.
>     >>>>>>>>
>     >>>>>>>> Cheers,
>     >>>>>>>> Pavel
>     >>>>>>>>
>     >>>>>>>> On Sat, Oct 17, 2020 at 9:54 PM Martynas Jusevičius
>     >>>>>>>> <martynas@atomgraph.com <mailto:martynas@atomgraph.com>>
>     >>>>>>>> wrote:
>     >>>>>>>>
>     >>>>>>>>> Does RDF* need new semantics at all? Couldn't it be a
>     syntax-level
>     >>>>>>>>> convention for unique triple IDs?
>     >>>>>>>>>
>     >>>>>>>>> E.g. <<s>, <p>, <o>> being syntactic sugar for
>     >>>>>>>>> uri(concat("urn:rdf:id:", hash(str(<s>)), hash(str(<p>)),
>     >>>>>>>>> hash(str(<p>)))).
>     >>>>>>>>>
>     >>>>>>>>> For example, the triple
>     >>>>>>>>>
>     >>>>>>>>> <
>     >>>>>>>>> <https://www.w3.org/People/Berners-Lee/card>
>     >>>>>>>>> <http://xmlns.com/foaf/0.1/primaryTopic>
>     >>>>>>>>> <https://www.w3.org/People/Berners-Lee/card#i>
>     >>>>>>>>> gives
>     >>>>>>>>>
>     >>>>>>>>> URI(CONCAT("urn:rdf:id:",
>     >>>>>>>>> SHA1(STR(
>     >>>>>>>>> <https://www.w3.org/People/Berners-Lee/card>
>     >>>>>>>>> )),
>     >>>>>>>>> SHA1(STR(
>     >>>>>>>>> <http://xmlns.com/foaf/0.1/primaryTopic>
>     >>>>>>>>> )),
>     >>>>>>>>> SHA1(STR(
>     >>>>>>>>> <https://www.w3.org/People/Berners-Lee/card#i>
>     >>>>>>>>> ))))
>     >>>>>>>>>
>     >>>>>>>>> gives
>     >>>>>>>>>
>     >>>>>>>>>
>     <urn:rdf:id:63874e34ff5f326e67e888f6818f72d5033ecb343cadd8c2120281d72cefce4481485c937b6a95a656beaa67c13db29f3d7be801328b7c9125976c5f>
>     >>>>>>>>>
>     >>>>>>>>> which essentially would become the "5th element", in
>     addition to quads.
>     >>>>>>>>>
>     >>>>>>>>> On Thu, Oct 15, 2020 at 1:38 PM Pierre-Antoine Champin
>     >>>>>>>>>
>     >>>>>>>>> <pierre-antoine.champin@ercim.eu
>     <mailto:pierre-antoine.champin@ercim.eu>>
>     >>>>>>>>> wrote:
>     >>>>>>>>>
>     >>>>>>>>>> On 14/10/2020 23:13, Peter F. Patel-Schneider wrote:
>     >>>>>>>>>>
>     >>>>>>>>>> Let's make the height example even more stark.
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>> :loisLane :believes << :clarkKent :height
>     "6.0"^^xsd:decimal >> .
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>> does not imply
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>> :loisLane :believes << :clarkKent :height
>     "6.00"^^xsd:decimal >> .
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>> I would hope that any Tom, Dick, and Lois can realize
>     that these two literals
>     >>>>>>>>>> are the same.
>     >>>>>>>>>>
>     >>>>>>>>>> I see your point, but this is really a matter of
>     deciding where you put the boundary...
>     >>>>>>>>>>
>     >>>>>>>>>> So I would still prefer to be radical here and consider
>     any lexical difference as potentially significant.
>     >>>>>>>>>>
>     >>>>>>>>>> If you want to stick to literals that have to be
>     supported in RDF
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>> :loisLane :believes << :clarkKent :name "Clark"@en-US >> .
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>> does not imply
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>> :loisLane :believes << :clarkKent :name "Clark"@en-us >> .
>     >>>>>>>>>>
>     >>>>>>>>>> Are "Clark"@en-US and "Clark"@en-us really different
>     literals, for the abstract syntax??
>     >>>>>>>>>>
>     >>>>>>>>>> I would have thought they are the same (and so the
>     implication above would hold).
>     >>>>>>>>>>
>     >>>>>>>>>> Reading the spec again, I realize that things are not
>     so clear: "Lexical representations of language tags MAY be
>     converted to lower case", and then Literal term equality requires
>     that language tags "compare equal, character by character". So
>     these 2 literals MAY be considered equal, and the implication MAY
>     hold... :-/ Add to this that BCP47 explicitly state that language
>     tags are case insensitive... I'd say that we are in gray area here.
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>> peter
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>> On 10/14/20 4:45 PM, Doerthe Arndt wrote:
>     >>>>>>>>>>
>     >>>>>>>>>> Dear Peter,
>     >>>>>>>>>>
>     >>>>>>>>>> you are right with both observations. The question is
>     whether we want that
>     >>>>>>>>>> behavior or not.
>     >>>>>>>>>>
>     >>>>>>>>>> In
>     >>>>>>>>>> https://w3c.github.io/rdf-star/
>     >>>>>>>>>> there is a section on referential opacity.
>     >>>>>>>>>> The main claim there is that triples are referentially
>     opaque.
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>> But embedded triples are much weaker than just being
>     referntially opaque.  To
>     >>>>>>>>>> see this consider the following RDF* graph under the
>     RDF* version of RDF
>     >>>>>>>>>> entailment recognizing xsd:decimal and xsd:integer.
>     >>>>>>>>>>
>     >>>>>>>>>> :loisLane :believes << :clarkKent :height
>     "6"^^xsd:decimal >> .
>     >>>>>>>>>>
>     >>>>>>>>>> In this semantics "6"^^xsd:decimal means the same as
>     "6"^^xsd:integer so one
>     >>>>>>>>>> would expect that
>     >>>>>>>>>>
>     >>>>>>>>>> :loisLane :believes << :clarkKent :height
>     "6"^^xsd:integer >> .
>     >>>>>>>>>>
>     >>>>>>>>>> is RDF*-entailed.
>     >>>>>>>>>>
>     >>>>>>>>>> But it is not.  There are two reasons for this.
>     >>>>>>>>>>
>     >>>>>>>>>> First, there is no requirement that satisfying
>     interpretations for the first
>     >>>>>>>>>> graph map < :clarkKent :height "6"^^xsd:integer > to
>     anything and if a
>     >>>>>>>>>> satisfying interpretation does map the triple there is
>     no requirement that its
>     >>>>>>>>>> ITEXT mapping gives the triple its correct meaning. 
>     (The value of ITEXT for
>     >>>>>>>>>> the triple could have the real number pi as its third
>     element.)
>     >>>>>>>>>>
>     >>>>>>>>>> Second, "6"^^xsd:integer is a different node from
>     "6"^^xsd:decimal. So even if
>     >>>>>>>>>> the intepretation treats the second embedded triple
>     nicely, and thus gives it
>     >>>>>>>>>> the same meaning as the first embedded triple, they are
>     still two different
>     >>>>>>>>>> triples and :loisLane can believe one but not the
>     other.  So very little of
>     >>>>>>>>>> the semantics of RDF gets into embedded triples.
>     >>>>>>>>>>
>     >>>>>>>>>> We wanted different that different representations are
>     treated differently
>     >>>>>>>>>> if they have the same meaning. The reason for that is
>     that we expected that
>     >>>>>>>>>> RDF* would also be used to make statements about
>     triples as they were
>     >>>>>>>>>> stated, for example to be able to explain the reasoning
>     performed on the
>     >>>>>>>>>> triples but also for simple provenance. In these cases
>     there should be a
>     >>>>>>>>>> difference between
>     >>>>>>>>>>
>     >>>>>>>>>> :loisLane :believes << :clarkKent :height
>     "6"^^xsd:decimal >> .
>     >>>>>>>>>>
>     >>>>>>>>>> and
>     >>>>>>>>>>
>     >>>>>>>>>> :loisLane :believes << :clarkKent :height
>     "6"^^xsd:integer >>
>     >>>>>>>>>>
>     >>>>>>>>>> since we still talk about different representations.
>     >>>>>>>>>>
>     >>>>>>>>>> Each triple is, in effect, its own context.  So, in an
>     RDFS version of RDF*,
>     >>>>>>>>>> even if :loisLane believes several triples that should
>     imply another, they
>     >>>>>>>>>> generally don't.  For example:
>     >>>>>>>>>>
>     >>>>>>>>>> :loisLane :believes << :clarkKent rdf:type :man >> .
>     >>>>>>>>>> :loisLane :believes << :man rdfs:subClassOf :human >> .
>     >>>>>>>>>>
>     >>>>>>>>>> Does not imply
>     >>>>>>>>>>
>     >>>>>>>>>> :loisLane :believes << :clarkKent rdf:type :human >> .
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>> So embedded triples are incredibly weak in RDF*. 
>      Making them useful will
>     >>>>>>>>>> likely require quite a bit of work.
>     >>>>>>>>>>
>     >>>>>>>>>> Here, "useful" depends again on your intended use. We
>     wanted to have a
>     >>>>>>>>>> rather weak semantics which allows users with more
>     complex use cases to add
>     >>>>>>>>>> their semantics. It is easier to make the semantics
>     more complex by adding
>     >>>>>>>>>> extensions than to ignore certain parts. I for example
>     remember that Jos De
>     >>>>>>>>>> Roo announced some time ago that his EYE reasoner
>     supports rules on RDF*. Of
>     >>>>>>>>>> course that alone would not allow you to cover all
>     cases, but it could be
>     >>>>>>>>>> very helpful in practice.
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>> On the other hand, there are some unusual inferences
>     that can be made in
>     >>>>>>>>>> RDF*.  In an RDF* version of RDFS++ it is possible to
>     state that two triples
>     >>>>>>>>>> are the same.   The graph
>     >>>>>>>>>>
>     >>>>>>>>>> :loisLane :believes << :superman :can :fly >>.
>     >>>>>>>>>> << :superman :can :fly >> owl:sameAs << :clarkKent :can
>     :fly >> .
>     >>>>>>>>>>
>     >>>>>>>>>> is consistent here and implies
>     >>>>>>>>>>
>     >>>>>>>>>> :superman owl:sameAs :clarkKent .
>     >>>>>>>>>> :loisLane :believes << :clarkKent :can :fly >>.
>     >>>>>>>>>>
>     >>>>>>>>>> This last case is an interesting one. We indeed wanted
>     the triple
>     >>>>>>>>>>
>     >>>>>>>>>> :loisLane :believes << :clarkKent :can :fly >>.
>     >>>>>>>>>>
>     >>>>>>>>>> to be a consequence of your statements. The question is
>     whether
>     >>>>>>>>>>
>     >>>>>>>>>> :superman owl:sameAs :clarkKent .
>     >>>>>>>>>>
>     >>>>>>>>>> should follow (it does indeed follow, just as you
>     describe). We made the
>     >>>>>>>>>> semantics of embedded triples the way it is to be able
>     to deal with blank
>     >>>>>>>>>> notes. Here, I can't give a concrete answer whether (at
>     least to my
>     >>>>>>>>>> understanding) it should be that way. I will think
>     about it (and read
>     >>>>>>>>>> Pierre-Antoine's thoughts in the mean time, which just
>     arrived as well) and
>     >>>>>>>>>> come back to you.
>     >>>>>>>>>>
>     >>>>>>>>>> Kind regards,
>     >>>>>>>>>> Doerthe
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>>
>     >>>>>>>>>>
>
Received on Tuesday, 27 October 2020 16:07:23 UTC