Re: RDF* vs RDF vs named graphs from thomas lörtsch on 2020-12-19 (public-rdf-star@w3.org from December 2020)

From: thomas lörtsch <tl@rat.io>
Date: Sat, 19 Dec 2020 21:53:22 +0100
To: Olaf Hartig <olaf.hartig@liu.se>
Cc: public-rdf-star@w3.org
Message-Id: <F1086BCA-CAC9-41DA-9C4B-B1ACE915A085@rat.io>
Thinking out loud:

rdf:value as it is provides only limited leverage. Although one might simply ignore those subtleties just like users of RDF* will inevitably use embedded triples to annotate not only the whole triple but also parts (or, against all warnings, occurrences) of it, we should aim for a more comprehensive design.

However the potential is there. Take 
- a hypothetical nary:mainValue,
- generalized RDF
- a SPARQL extension that finds nodes in positions to which they are related through nary:mainValue and
- some sugar coating on the surface. 
Like in Property Graphs this would allow to annotate subject, predicate and object seperately. The predicate annotation would refer to the whole triple.

Say Alice loved Bob when she was 18.
    :Alice :loves :Bob .
    << :Alice :loves :Bob >> :age 18 .
doesn’t capture that with sufficient precision. Even if we assume some semantic wiggle room we just can’t be sure who was 18: Alice, Bob, or both? Or is the triple from 2002?

However
    _:a :loves :Bob .
    _:a nary:mainValue :Alice ;
        :age 18 .
does. The surface syntax should ensure that all tokens of _:a are replaced with :Aice and somehow the age property is attached in a visually unambiguous way. Querying for :Alice - like e.g ':Alice ?b ?c' - should also return ':Alice :loves :Bob'. 

In case we also know that Bob’s age at the time was 21 we get
    _:a :loves :b .
    _:a nary:mainValue :Alice ;
        :age 18 .
    _:b nary:mainValue :Bob ;
        :age 21 .

And if we also know that this relationship was mostly romantic we get
    _:a :p :b .
    _:a nary:mainValue :Alice ;
        :age 18 .
    _:b nary:mainValue :Bob ;
        :age 21 .
    _:p nary:mainValue :loves ;
        :inclination :romantic.
In RDF* that would be
    :Alice :loves :Bob .
    << :Alice :loves :Bob >> :age 18 ;
                             :inclination :romantic.
so, while the problem with :age remains unresolved, this last annotation would be supported pretty well (although, if one get’s picky, it’s not the triple that is romantically inclined but the relationship, but such levels of scrutiny seldomly produce useful results).
A blank node in predicate position requires to loosen syntactic restrictins in RDF. But what is that compared to the new node type that RDF* requires?

This is quite flexible, expressive and captures the Property Graphs usecase pretty well. But obviously also very ugly. It has to be supported by a surface layer that replaces the blank node with the nary:mainValue wherever possible, attaching secondary values appropriately. The query layer also has to be extended to transparently support queries for nodes in position to which they are only attached through a nary:mainValue relation. But again, how would this effort in SPARQL compare to those required to support RDF*?

Like RDF* this is no meta modelling solution but contrary to RDF* it also can’t be mistaken for one. It stays strictly within the simplistic, flat world of main street RDF. We can't get into any problems with differentiating unique triples from  occurrences. We don’t need warnings of misunderstandings that people will ignore anyways.

One thing is easier with RDF*: adding properties to a triple that has already been written down. Nothing needs to be changed then in RDF*, just more triples (containing embedded triples) will be added. With the approach above however - say, if we only learn later of Bob’s age - :Bob would have to be replaced by _:b in the original triple. This is a bit worrying.

Embedded triples would be much better suited to encode occurrences - something the approach above should rather not be used for.

As said in the beginning: I’m just thinking out loud here. I’m still adapting to the new world where RDF* is not suitable to meta-modelling any more but only syntactic sugar for n-ary relations (of a very specific kind, if one follows Olaf’s strict interpretation, but who will, in practice) and try to figure out the repercussions.

Thomas


> On 19. Dec 2020, at 16:59, Olaf Hartig <olaf.hartig@liu.se> wrote:
> 
> Hi Thomas,
> 
> rdf:value is not useful here. It's intended use is for "describing structured 
> values" [1] as in the following example (adapted from [1]):
> 
> :product35 :weight [
>                           rdf:value 2.4 ;
>                           :unit :kilograms ] .
> 
> Notice that the :weight triple in this example has a blank node in its object 
> position. So, we may use a blank node label, say _:x, and rewrite the example 
> to the following snippet of Turtle, which is semantically equivalent with the 
> one above.
> 
> :product35  :weight  _:x .
> _:x  rdf:value  2.4 .
> _:x  :unit  :kilograms .
> 
> Now it should become clear that :unit is a property of the object of the 
> :weight triple rather than being a property/annotation of the :weight triple.
> 
> Best,
> Olaf
> 
> 
> [1] https://www.w3.org/TR/rdf-schema/#ch_value
> 
> 
> On fredag 18 december 2020 kl. 22:07:13 CET thomas lörtsch wrote:
>>> On 18. Dec 2020, at 15:45, Pierre-Antoine Champin
>>> <pierre-antoine.champin@ercim.eu> wrote:
>>> 
>>> Thomas,
>>> 
>>> I apologize if I sounded disrespectful and patronizing. That was not at
>>> all my intention.
>> Apologies accepted, of course, but it was not an apology I was after but a
>> discussion. However it seems that a better place to continue this
>> discussion might be your PR (https://github.com/w3c/rdf-star/pull/69) and
>> I’ll wait until you have incorporated your insights from today’s call.
>> 
>> OTOH maybe this ship has sailed and it’s just me who’s not getting it. At
>> least your PR already seems to  provide a pretty good intuition of what one
>> gets from RDF* and what not. That’s not too bad either (although of course,
>> still, a missed opportunity, a deplorable lack of ambition, a shame! ;-)
>> 
>> However, PR aside, I wonder if you have any thoughts about the rdf:value
>> idea. When RDF* seemed to be about occurrences the reification approach
>> made sense. Now that it clearly is about unique types why not drop all the
>> complexity that comes with triples as node types, unasserted assertions etc
>> and settle for a simple shortcut to the only mildly involved compositional
>> construct that rdf:value represents? It would cover the Property Graph
>> usecase pretty well and would stay very true to the simple, compositional
>> core and flat world default semantics of RDF.
>>> As for the position I was trying to defend, I don't consider it as "my
>>> semantics". I sincerely believe that this position is shared by a number
>>> of people on the list -- as I am sure you do about the position you are
>>> defending.
>> That was a shortcut expression and my apologies if that was offending.
>> 
>> Thomas
>> 
>>>  best
>>> 
>>> On 18/12/2020 12:16, thomas lörtsch wrote:
>>>> Pierre-Antoine,
>>>> 
>>>> 
>>>> you’re completely missing the point.
>>>> 
>>>>> On 17. Dec 2020, at 14:54, Pierre-Antoine Champin
>>>>> <pierre-antoine.champin@ercim.eu> wrote:
>>>>> 
>>>>> Peter,
>>>> 
>>>>> in issue #64 (https://github.com/w3c/rdf-star/issues/64) you wrote:
>>>> From this sentence
>>>> 
>>>>>> central examples have fatal flaws if embedded triples are unique
>>>> 
>>>> you only take the first half and then go one to show how the
>>>> "regrettable", "unfortunate" examples can be saved to fit your
>>>> semantics. You introduce new blank nodes and indirections as if the
>>>> authors didn’t know what they are doing and had to be tought basic RDF
>>>> modelling skills. You even replace well-established properties by
>>>> something you invented ad-hoc with a condiserably different meaning. All
>>>> the examples rather obviously understand embedded triples as occurrences
>>>> but you consequently treat them as wrongly modelling unique triples.
>>>> 
>>>> Notwithstanding the lack of respect and the patronizing attitude, what
>>>> you actually show is that the semantics you propose don’t cover those
>>>> usecase. Which is the point that has repeatedly been made. Calling that
>>>> a possible misuse of RDF* is, well, an interesting perspective. My fear
>>>> is that the world will not bend to your semantics and that the ensuing
>>>> muddle will not profit anybody.
>>>> 
>>>> This is not to say that there is no case for unique triples: there is and
>>>> not long ago I was overly focused on the annotation usecase that
>>>> understands them as occurrences. But both usecases are vaild, widely
>>>> used and advertized as to be solved by RDF*. Property graphs can do both
>>>> without saying as they have no semantics, but in RDF they have to be
>>>> catered for.
>>>> 
>>>> And one last thing: if you insist on only covering the unique triples
>>>> reading then you should drop everything reification related. Drop SA
>>>> mode and drop the new node type because what you are really doing is
>>>> defining semantic sugar for n-ary relations, and those are covered by
>>>> rdf:value.>> 
>>>> :a :b :c {| :d :e |}
>>>> 
>>>> is then syntactic sugar for
>>>> 
>>>> :a :b [
>>>> :
>>>>  rdf:value :c ;
>>>> 
>>>>  :d :e
>>>> 
>>>> ]
>>>> 
>>>> That has a clear unique triple semantics, true to the flat world ideal of
>>>> RDF. It would spare us a whole lot of trouble and avoid any confusion.
>>>> It would not cover the annotation usecase and anything that requires a
>>>> bit more complexity than the simplistic base of RDF but since you’re not
>>>> planning to actually support that anyways, why not at least be honest
>>>> about it!
>>>> 
>>>> Of course it would be a wasted opportunity but since you and Olaf seem to
>>>> be so heavily inclined…
>>>> 
>>>> 
>>>> Thomas
>>>> 
>>>>> As you previously made a list of such flawed examples (thanks for that),
>>>>> I'll try to explain why I think that these examples are, though
>>>>> imperfect, not fataly flawed.>>> 
>>>>> On 03/12/2020 00:47, Peter F. Patel-Schneider wrote:
>>>>>> I certainly agree with Thomas that examples used throughout the RDF*
>>>>>> documents and discussions are ill-supported by the various formal
>>>>>> definitions underlying RDF*.
>>>>>> 
>>>>>> We see
>>>>>> 
>>>>>> :bob foaf:name "Bob" .
>>>>>> 
>>>>>> <<:bob foaf:age 23>>
>>>>>> 
>>>>>>  dct:creator
>>>>>> 
>>>>>> <http://example.com/crawlers#c1>
>>>>>> 
>>>>>> ;
>>>>>> 
>>>>>>  dct:source
>>>>>> 
>>>>>> <http://example.net/listing.html>
>>>>>> 
>>>>>> .
>>>>>> 
>>>>>> in
>>>>>> http://ceur-ws.org/Vol-1912/paper12.pdf
>>>>> 
>>>>> Assuming that the <<...>> notation represents unique triples, this
>>>>> examples conveys the following information: 1) bob is 23, 2) the fact
>>>>> that bob is 23 was asserted by #c1, and 3) the fact that bob is 23 was
>>>>> found in listing.html . It is tempting to infer that it was #c1 who
>>>>> found this information in that document, but that's not what the
>>>>> example is saying. This can be regretted, but that does not make the
>>>>> example useless or wrong...
>>>>> 
>>>>> It is not fatally flawed, because IF someone wanted to convey richer
>>>>> information such that "#c1 found this triple in that document", this
>>>>> would be possible by introducing an additional node, representing the
>>>>> occurrence of the triple in the document.
>>>>> 
>>>>> That being said, I agree that the example is imperfect because:
>>>>> 
>>>>> * it can easily to the over-interpretation I mentioned above, and
>>>>> 
>>>>> * the choice of the dct:creator predicate is arguable (nobody "creates"
>>>>> a triple, it is an abstract mathematical construct that "exists",
>>>>> regardless of who asserts it or not).>>> 
>>>>>> <<:painting :height 32.1>>
>>>>>> 
>>>>>>  :unit :cm;
>>>>>>  :measurementTechnique :laserScanning;
>>>>>>  :measuredOn "2020-02-11"^^xsd:date.
>>>>> 
>>>>> Granted, this example is very misleading (or mislead). 
> ":measurementTechique" can hardly be argued to be a property of the triple 
> (more of the measurement that lead to assert this triple). This should have 
> looked more like:
>>>>>  <<:painting :height 32.1>>
>>>>> 
>>>>>    :unit :cm;
>>>>>    :measurement [
>>>>>    :
>>>>>        :technique :laserScanning;
>>>>>        :when "2020-02-11"^^xsd:date
>>>>> 
>>>>>    ].
>>>>> 
>>>>> This revised example shows, I believe, that <<...>> denoting unique
>>>>> triple is not an obstacle to solving this use case.>>> 
>>>>>> <<:man :hasSpouse :woman>>
>>>>>> 
>>>>>>  :source :TheNationalEnquirer;
>>>>>>  :webpage
>>>>>> 
>>>>>> <http://nationalenquirer.com/news/2020-02-12>
>>>>>> ;
>>>>>> 
>>>>>>  :retrieved "2020-02-13"^^xsd:dateTime.
>>>>>> 
>>>>>> in
>>>>>> https://graphdb.ontotext.com/documentation/9.2/free/devhub/rdf-sparql-s
>>>>>> tar.html>>> 
>>>>> Again, this example is very misleading, because clearly the intention is
>>>>> to convey the information that "this triple was retrieved from the
>>>>> given page on a given date" (and not "... from the given page, but also
>>>>> on a given date"). If we were to represent two distinct retrieval, we
>>>>> would lose the link between source and date.>>> 
>>>>> However, I still believe it is possible to convey this information using 
> *unique triples*, either with an intermediate node representing the retrieval:
>>>>>  <<:man :hasSpouse :woman>> :occurence [
>>>>> 
>>>>>    :source :TheNationalEnquirer;
>>>>>    :webpage
>>>>> 
>>>>> <http://nationalenquirer.com/news/2020-02-12>
>>>>> ;
>>>>> 
>>>>>    :retrieved "2020-02-13"^^xsd:dateTime
>>>>> 
>>>>>  ].
>>>>> 
>>>>> or possibly with deeply nested triples
>>>>> 
>>>>>  <<:man :hasSpouse :woman>>
>>>>> 
>>>>>    :retrievedFrom
>>>>> 
>>>>> <http://nationalenquirer.com/news/2020-02-12>
>>>>> 
>>>>>    {| :on "2020-02-13"^^xsd:dateTime |}.
>>>>> 
>>>>> <http://nationalenquirer.com/news/2020-02-12>
>>>>> 
>>>>> dct:creator :TheNationalEnquirer;
>>>>> 
>>>>>> <<:Bess_Schrader :employedBy :Enterprise_Knowledge . >> :dateAdded
>>>>>> "2020-05-22" . <<:Bess_Schrader :employedBy :Enterprise_Knowledge . >>
>>>>>> :addedBy :user_bscrader .
>>>>>> 
>>>>>> in
>>>>>> https://enterprise-knowledge.com/rdf-what-is-it-and-why-do-i-need-it/
>>>>> 
>>>>> This example is interesting because in the code, the author used two
>>>>> different occurences of the <<...>> notation, but in the accompanying
>>>>> figure, a single :employedBy arc is annotated by the two properties
>>>>> :dateAdded and :addedBy.
>>>>> 
>>>>> I think it demonstrate that the author (and, I would venture to
>>>>> extrapolate, many people starting with RDF*) did not really think about
>>>>> the type/token distinction, or the subtle problems that may arise when
>>>>> they are mixed up. I think the problem is more that one, rather than
>>>>> "everyone assumes that embedded triples represent occurrences".>>> 
>>>>>> <<?c a rdfs:Class>> dct:source ?src ;
>>>>>> 
>>>>>>    prov:wasDerivedFrom <<?c a owl:Class>> .
>>>>> 
>>>>> For me, this example is really similar to the first one: it states "T1
>>>>> appears in src, and T1 can be derived from T2". Both assertions are
>>>>> true independantly of each other, and can be considered true of the
>>>>> triples themselves, rather than occurrences thereof.
>>>>> 
>>>>> As in the very first example, the chosen predicate (here,
>>>>> prov:wasDerivedFrom) is not the best choice (PROV is about artifacts,
>>>>> not abstrac things like triples), and I propose to change it for a more
>>>>> neutral triple (:canBeDerivedFrom).>>> 
>>>>>> :loisLane :believes << :superman :can :fly >>.
>>>>> 
>>>>> I don't see the problem here. Of course, one could argue that [my belief
>>>>> that superman can fly] and [lois' belief that superman can fly] are
>>>>> different things, but I could just as well argue that Lois and I
>>>>> believe *the same thing*. Maybe if the predicate was named
>>>>> :believesThat would it be clearer that this example uses the second
>>>>> option?>>> 
>>>>>> in
>>>>>> https://w3c.github.io/rdf-star/rdf-star-cg-spec.html
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> What should be concluded from this?  Just about the most charitable
>>>>>> conclusion is that RDF* is unsuitable for its claimed use.
>>>>> 
>>>>> I don't think that is very charitable ;-), nor really fair. At least
>>>>> that's what I tried to show above.
>>>>> 
>>>>> What I conclude, though, is that RDF* is easily misued, and that the CG
>>>>> report should include material to help people avoid these caveats. I'll
>>>>> make a PR to that effect.>>> 
>>>>>  best
>>>>>> 
>>>>>> So what is RDF* good for?  I am concerned about this.
>>>>>> 
>>>>>> 
>>>>>> peter
> 
> 
>
Received on Saturday, 19 December 2020 20:53:42 UTC