Re: RDF* and grouping relation properties from thomas lörtsch on 2020-06-24 (public-rdf-star@w3.org from June 2020)

From: thomas lörtsch <tl@rat.io>
Date: Wed, 24 Jun 2020 17:59:34 +0200
To: Patrick J Hayes <phayes@ihmc.us>
Cc: Tim Finin <finin@umbc.edu>, public-rdf-star@w3.org
Message-Id: <9253EEE7-C1DC-4245-9D3F-263D616D5C7D@rat.io>
> On 21. Jun 2020, at 23:01, Patrick J Hayes <phayes@ihmc.us> wrote:
> 
>> On Jun 21, 2020, at 5:14 AM, thomas lörtsch <tl@rat.io> wrote:
>> 
>>> On 18. Jun 2020, at 17:44, Tim Finin <finin@umbc.edu> wrote:
>>> 
>>> While experimenting with RDF* I realized one issue: for some relations, we may have several properties that should be treated as a group.  For example, the provenance of a relation extracted from the text of a web page might include a link to the page and the date retrieved.
>>> 
>>> Using the following two RDF* expressions merges the four properties so that we can no longer determine which :source and :retrieved values go together.
>>> 
>>> << :man :hasSpouse :woman >>
>>>    :source <http://foo.com/>;
>>>    :retrieved "2020-06-17"^^xsd:date .
>>> << :man :hasSpouse :woman >>
>>>    :source <http://bar.com/>;
>>>    :retrieved "2020-01-01"^^xsd:date .
>>> 
>>> Using a traditional RDF reification approach maintains the pairing.
>>> 
>>> :man2 :hasSpouse :woman2 .
>>> [ ]  a rdf:Statement ;
>>>     rdf:subject :man2 ;
>>>     rdf:predicate :hasSpouse ;
>>>     rdf:object :woman2 ;
>>>    :source <http://foo.com/> ;
>>>    :retrieved "2020-06-17"^^xsd:date .
>>> [ ] a rdf:Statement ;
>>>    rdf:subject :man2 ;
>>>    rdf:predicate :hasSpouse ;
>>>    rdf:object :woman2 ;
>>>   :source <http://bar.com/>;
>>>   :retrieved "2020-01-01"^^xsd:date .
>> 
>> In my understanding of the RDF Standard Reification semantics your two blank nodes are owl:sameAs as the reification quad refers to the abstract triple type, not any concrete token.
> 
> Correct. But bear in mind that none of the semantic guidance in the RDF specs for reification is normative. RDF reification has /no/ normative semantics. 
> 
> The reason for this is that before the RDF WG had even convened, there were already several conflicting views about what it should mean, and I was not able to come up with one semantic story which could accomodate them all. I sketched a possible semantics which seemed to me to be the only one that could fully make sense. TimBL (and others) did not like this for the reason you mention, that it does not allow for things like provenance statements, but as there was not at the time any alternative, it was included - better than nothing?

They do serve one purpose quite well: introducing triples without actually stating them but as abstract types, because statements only present through the quadlet will never accidently pop up in any unsuspecting inquiry. For that specific purpose I think it’s really good to have them and I’m glad they never got deprecated. Also they can serve as the baseline triple construct on which more compact approaches - like RDF* - can be grounded. It’s hard to see how a triple based solution could be any more compact. For any real meta modelling however they are much too involved syntactically. 
Besides syntactic sugar they also lack a grounding in real world graphs - be it documents, Named Graphs or whatever - to address actual occurrences. All in all, that’s a problem that doesn’t seem to be that hard to solve. I’ve read a lot of papers about practical approaches to contextualization, annotation etc and I’ve found very little about addressing a statement in a graph - ergo a URI that combines a graph with a triple. But it’s also not that hard to do either. However I came to the (as always provisional) conclusion that statement reification is the wrong way to go - see below.

> - but labelled as non-normative. 

And thanks for the historic background. I would really wish that some historian started to write up the history of RDF. E.g. Dan Brickleys mails on semantic-web@w3.org about a year ago where so thrilling :) I assume it would help (and have helped) discussions about RDF if such considerations were more accessible to the greater public. Trying to make sense of all this stuff one doesn’t normally want to wade through email archives from decades ago to get a better understanding. 

>> That contradicts the intuition of the provenance statements (:source and :retrieved) but that’s just how things are right now in RDF. There is no sound meta modelling in RDF.
> 
> The problem with a semantics which /would/ support provenance is that it would have needed to introduce a semantic notion of a single published instance of a triple, and no such notion had ever been considered. It would have introduced a large and not well-understood extension to the basic semantic framework of RDF, one that was rejected by the RDF WG as being too large a change to RDF. One can get a sense of what is needed by the subsequent work by Jeremy Carroll, Christian Bizer, Patrick Stickler and myself on the semantics of named graphs. (If anyone decides to try to read this, notice that the first thing it does is to distinguish between a named graph and an RDF graph.) 

I did read it. The decisive point to me seems to be that by the way the paper defines named graphs they can be properly addressed and therefor attributed. A graph name is not a function of the graph (the abstract graph one could say, to align terminology with the discussion about statement reification) but names an occurrence of a graph. I love that because it’s the key to sound meta modelling.
However, and that puzzles me and makes me unsure if I missed anything more, this is rather easy to implement. As I see it the problem can be condensed down to the following example. SPARQL can soundly address a graph because the FROM syntax unambiguously uses an URI to refer to a graph. Say I sloppily named a graph with triples that I created in Paris as 
 <paris.fr> { :a :b :c . :d :e :f }
SPARQL can query this graph alright but adding an attribute like 
 <paris.fr> :createdBy :me
is obviously problematic. To address and/or attribute such a graph outside of SPARQL one might wish for a special, yet to be defined syntax like e.g.
 <rdf:graph?paris.fr>
That would be handy and solve the problem but would require updates on the whole Semantic Web machinery. However some small vocabulary like 
 :myParisGraph rdf:denotesGraphAt <paris.fr>
would do as well (albeit being a little tedious).

As I see it that’s all that’s needed. The rest is vocabularies to express more specific naming semantics, negotiate syndication mecahnisms etc. Essentially the second half of the Named Graph paper is all about what could be done with the right vocabulary IFF sound graph denotation is ensured. 
Did I get that right?


One critique on the paper: I would prefer a nested construct to the namedGraph(name,graph) pair the authors chose, as it would better capture what’s going on and how naming works in practice. First there is the abstract graph: it exists if we can think it up (I do plagiate N3’s "All lists exist" here, which at first I didn’t like but now…): 
 abstractGraph
Then the abstarct graph gets instantiated as someone writes it down: 
 instanceOf(abstractGraph)
Then the instance/occurrence/graphToken gets named:
 nameOf(instanceOf(abstractGraph)
Or maybe it doesn’t get a name... naming is actually optional in real life, proper denotational naming is even more optional (take indexicals for example, and all the ways meaning gets encoded in URIs). That is just social practice, it’s the way people use language to get communication done with the least effort. I read with great interest Harry Halpins description of your debate with TimBL about identification semantics - denotation vs indication - and this is a similiar problem. Being specific is important for sound formalization but tedious for humans. So we as engineers have to come up with ways to provide sound defaults and also ways to be specific _on demand_. The Semantic Web is quite bad at the latter, but I think that can be fixed, even in a backwards compatible way. I’m however not sure I already fully understand what it takes for identification to be sound.


>> It would of course be possible to define subclasses of rdf:Statement, like rdf:TripleType and rdf:TripleToken, define a context (e.g. a Named Graph) in which a triple becomes an actual token, and then reify such tokens. Only then the above provenance statements would have sound semantics.
> 
> Right.
> 
>> 
>> To the best of my knowledge RDF* binds its reification semantics back to RDF Standard Reification. The RDF* <<…>> construct is syntactic sugar for the RDF Standard Reification quadlet - nothing more, nothing less [0]. So merging those provenance statements is indeed the right thing to do. One might even argue that RDF* has a slight advantage over RDF Standard Reification as it represents the semantics more faithfully. However not what anybody would expect from the solution to reification in RDF.
> 
> IMO, there is nothing useful that RDF* provides that could not be done better using the named graph vocabulary, with a solid semantics which also includes such things as notions of web publication, web speech acts and signed warrants. 

RDF* could be a good replacement to the reification quadlet with the  "abstract triple semantics" as outlined above. That’s of course not what people wish for. I agree that Named Graphs are better suited but there are some problems too, e.g. the perceptions that "graphs are only for grouping" and that provenance etc is naturally best handled at the triple level. However, apart from perceptions, if you look at the wide range of problems that have been covered under the labels of "context" or "annotation" and if you acknowledge that all those problems and perspectives are perfectly valid on their own than it becomes clear that Named Graphs, apllied as the one and only meta modelling technique, will indeed get very fractured, or fine-grained - not the database administrators one-dimensional grouping mechanism that they are in many places today - , and will need some extra machinery to manage them, like inheritance and hierarchical structures, specialized indexes, eager materialization of the most prominent aspects etc. 

But basically there is just no fundamental distinction between a graph which contains only one triple and a graph containing two or more triples. Likewise there is no attribute that always comes per triple and would never profit from grouping triples. Consequently there is no need for two parallell structures for single triples and groups of triples, like RDF* and Named Graphs. Olaf claimed last year that they are orthogonal but I’m increasingly sure that that’s precisely wrong.
To the contrary: introducing a triple centric meta modelling mechanism like RDF* would make everything even harder as there would be no way to know beforehand if a certain attribute belongs to the triple (eg encoded as RDF* snippet) or to some group of triples (eg Named Graph). So every meta attribute would have to be modeled and/or queried twice - per triple and per graph. Which is why RDF* might really be rather a problem than a solution. 

Basing all meta modelling on RDF* instead of Named Graphs however doesn’t scale well because of the verbosity of RDF*. It would probably perform much worse than a solution based on Named Graphs - even if we end up with quite a few singleton graphs/named triples because of fragmentation induced by a multitude of annotation dimensions.

Thomas


> Pat
> 
>> 
>> Thomas
>> 
>> 
>> [0] Well, not exactly, as there are those modes too - SA and PG - which may implicitly add the cited triple to the graph or not. That’s a proposal and I’m not sure if implementations support mode switching or which mode(s) they support. 
>> 
>> 
>> 
>>> A possible solution when using RDF* is to encapsulate associated properties as a blank node entity, as in the following
>>> 
>>> :man3 :hasSpouse :woman3 .
>>> << :man3 :hasSpouse :woman3 >>
>>>    :provenance [ :source <http://foo.com/>;
>>>                           :retrieved "2020-06-17"^^xsd:date ] .
>>> << :man3 :hasSpouse :woman3 >>
>>>    :provenance [ :source <http://bar.com/>;
>>>                           :retrieved "2020-01-01"^^xsd:date ] .
>>> 
>>> However, this approach seems to violate the normal key/value pattern of property graph properties, which could be a compatibility issue.
>> 
>>> 
>>> 
>>> --
>>> Tim Finin,  Willard and Lillian Hackerman Chair in Engineering,  Computer Science and
>>> Electrical Engineering, U. Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore MD
>>> 21250. http://umbc.edu/~finin, finin@umbc.edu, tfinin@gmail.com, mobile:410-499-3522
>
Received on Wednesday, 24 June 2020 15:59:58 UTC