Re: Some questions on RDF 1.1 Reification Semantics from thomas lörtsch on 2018-09-07 (semantic-web@w3.org from September 2018)

From: thomas lörtsch <tl@rat.io>
Date: Fri, 7 Sep 2018 12:09:02 +0200
To: semantic-web@w3.org
Cc: Pat Hayes <phayes@ihmc.us>
Message-Id: <5F00B074-295A-4C21-8CDC-18B748C3CB49@rat.io>
> On 31. Aug 2018, at 09:50, Pat Hayes <phayes@ihmc.us> wrote:
> 
> On 7/15/18 2:54 PM, thomas lörtsch wrote:
> 
> Over a month ago! Apologies for my late response.

Thanks for your response! As I said I’m not a native speaker and some subtleties are lost on me. So I assumed that "Best wishes" might mean something like "and now please leave me alone with that ancient gruft". Glad to be wrong :-)

>> Thanks a lot for the thorough explanation which really cleared up things for me (some further remarks inline below).
>> With those questions out of the way I’m now able to get to my main issue which is about the relation between the token and its reification. 
> 
> The key issue, which is not restricted to reification, is how to specify the relationship between an IRI and what it is supposed to denote. In a nutshell: how do things on the Web get their names?
> 
> Sorry about the verbosity... Lets take the example from the spec, the triple token
>>     ex:a ex:b ex:c .
>> and its reification
>>     ex:graph1 rdf:type rdf:Statement .
>>     ex:graph1 rdf:subject ex:a .
>>     ex:graph1 rdf:predicate ex:b .
>>     ex:graph1 rdf:object ex:c .
>> I understand (and intuitively agree) that
>>     ex:graph1
>> doesn’t entail the same consequences as the triple it reifies,
>>     ex:a ex:b ex:c .
>> since ex:graph1, although correctly and completey described above, in itself doesn’t actually state
>>     ex:a ex:b ex:c .
>> It just describes that statement, or even more precisely: such a statement.
> 
> What does the IRI 'ex:graph1' denote? This isn't specified anywhere, and indeed you seem to be slightly muddled about this yourself, above, because your reified description assumes it denotes the triple token, but your question uses it to refer to the reification graph itself.

Slightly muddled, yes. Issues involving semantics have that effect on me all the time.

>> I would now have hoped that the following graph was a semantically sound way to make an assertion about the provenance of the triple token of interest:
>>     ex:a ex:b ex:c .
>>     ex:graph1 rdf:type rdf:Statement .
>>     ex:graph1 rdf:subject ex:a .
>>     ex:graph1 rdf:predicate ex:b .
>>     ex:graph1 rdf:object ex:c .
>>     ex:graph1 ex:prov ex:rdf11mt .
>> as the triple token and its reification sit side by side in the same graph and so intuitively the denotation of ex:graph1 seems pretty definitive.
> 
> Sorry, not definitive enough :-)  And I don't think we would want this rule in any case, that mere syntactic adjacency forces an IRI to refer to a triple. What if there were other triples in the graph? How would we know which one was called 'ex:graph1'?  
> At the very least, we would want some kind of explicit graph labeling convention so we would have something like
> 
> ex:graph1 :{ ex:a ex:b ex:c . }
> 
> (see the literature on 'named graphs' for a better idea, but note that is is all post-2004.)

Regardless of the availability of Named Graphs it would be useful to be able to state stuff like:

 ex:PersonA ex:states ex:graph1 .
 ex:PersonB ex:endorses ex:graph1 .
 ex:PersonC ex:contradicts ex:graph1 .

For this to work it would be enough if ex:graph1 refered to the abstract triple, and implicitly also to all of its occurrences in any piece of RDF on the world. The reference wouldn’t need to be able to denote one specific token instance.

Probably the following would not work so well as it refers to a specific token of a statement in a specific situation:

 ex:graph1 ex:context ex:heated_dispute .

This would need a mechanism to adress specific tokens (or groups of tokens) like Named Graphs.

>> The spec however seems to say that this is not the case as e.g.
>>     ex:a ex:b ex:c .
>>     ex:graph1 rdf:type rdf:Statement .
>>     ex:graph1 rdf:subject ex:a .
>>     ex:graph1 rdf:predicate ex:b .
>>     ex:graph1 rdf:object ex:c .
>>     ex:graph1 ex:prov ex:rdf11mt
>>     ex:graph2 rdf:type rdf:Statement .
>>     ex:graph2 rdf:subject ex:a .
>>     ex:graph2 rdf:predicate ex:b .
>>     ex:graph2 rdf:object ex:c ..
>> wouldn’t entail
>>     ex:graph2 ex:prov ex:rdf11mt .
> 
> Correct, it would not.
> 
>> Or am I jumping to conclusions here?
>> To rephrase the question: the spec says "The subject of a reification is intended to refer to a concrete realization of an RDF triple, such as adocument in a surface syntax, rather than a triple considered as an abstract object.", adding some usecases like provenance of triples etc. However it also says: "suppose that IRI ex:graph1 is used to identify this graph. Exactly how this identification is achieved is external to the RDF model". That seems to leave every use of reification semantically unspecified, even very clear cases like "ex:graph1 ex:prov ex:rdf11mt ." above?!
> 
> Yes.

Okay! Well, at least that much is clear. 

> The entire RDF spec simply does not deal with the question of how an IRI can become recognized to be the name of something, ie to denote that thing in all interpretations. I tried to get this issue raised during the 2004 WG discussions, but at that time the dominant view was that this was a non-issue, because the HTTP protocoal meant that every IRI was clearly an 'identifier' of something: to figure out that an IRI means, you just throw it at the Web and see what comes back. Later discussions revealed the obvious fact that many IRIs identified something like a DBpedia page but were intended to denote something completely different, such as a German city or a distant galaxy, and this realization (which was extremely slow in coming to some very influential people) eventually gave rise to what came to be called the http-range-14 issue and its accompanying notion of "information resources". You can find a lot about that by googling various email archives with that phrase. But note, nothing in this entire discussion, which probably took up many person-years, actually came up with a formal convention for, or method of, attaching IRI names to things.
> 
> Another place that the issue (of how to attach IRI names to things) came up was how to semantically describe the meanings of datatype names such as xsd:integer. In the 2004 specs, the semantics of these is described using a thing called a 'datatype mapping' built into the RDF interpretation. In the later RDF1.1 specs, we changed this to recognize that some datatype IRIs, like the XML schema datatypes, had achieved the status of universally recognized Web names, with fixed meanings ultimately specified socially, by their universal use, rather than by any RDF semantic rule. Just as "Paris" denotes the capital of France by social use rather than by any kind of internet standardization decree, so does "xsd:integer" denote the XML schema datatype standard. This is a paradigm case of what we mean by "external to the RDF model".
> 
> The fact is, nobody has come up with a single naming convention for attaching names - IRIs - to RDF graphs or RDF triples which has become an accepted standard. I helped draft such a convention (named graphs) after the 2004 specs were published, and many implementors provided some way to do this (eg quads) which began to be used before the RDF1.1 WG was chartered. Perhaps unfortunately, however, these 'names' were used for a wide variety of purposes, many of them having nothing to do with naming, and by the time the second WG was chartered, we were obliged to respect these establish uses (mis-uses?) which had ignored the semantics provided by the original paper. As a result, graphs may now have 'names' ie labels, which allowed to NOT denote the graph they label, so the RDF 1.1 specs have explicit warnings that implementors should not assume that graph names actually name the graphs that they name. So we have made zero, or possibly negative, progress on this central issue.

Maybe one should have specified a reserved namespace or an algorithm that generates the name like e.g. an MD5 hash. As soon as you offer a way to freely name something the name will get overloaded with other concerns, like labeling, provenance tracking etc. That’s just the way things go.

The RDF 1.1 WG archives contain a proposal to explicitly specify that some graph name is indeed not a label but a name denoting the graph. It’s a pity that not even that basic mechanism didn’t make into the spec. 
I do understand the approach that some in the WG took to not standardize stuff that isn’t in use and has wide uptake already. OTOH this results in a chicken-and-egg problem as proposals like Named Graphs tend to get rejected because they are not standardized (but maybe the heated WG discussions burned the ground for Named Graphs specifically). Also, and contrarily to my rather pro-early-standardization stance, I wonder why no solution to reification and graph naming has become a de-facto standard since the WG punted on the issue. Maybe that’s a sign taht the demand isn’t as big as I think it is? But see below...

>> If — as I fear — every use of reification is semantically unspecified then why is the spec so reluctant, leaving the very use case of reification in a semantic limbo? After all in the case of triple tokens we’re not faced with the problem of graph labeling/naming that plagued the work on named graphs.
> 
> But we are. How does one name a triple, if one cannot even name a graph?

One might start with solving the triple naming problem and then tackle graph naming from there.

> By the way, you may be wondering why I went along with simply reiterating the 2004 wording about reification a decade later, given all the intervening discussion and arguments about this issue. There are several answers: the text is non-normative; as far as I can tell, it is still in fact correct; I was tired of arguing. But most importantly, it really does not matter, since in the entire intervening decade, I had not, and indeed still not have, seen a single useful application of RDF reification. 

I think it could be pretty useful:

* Concise reification could ease the modelling of n-ary relations and get rid of some blank nodes. The quite popular Property Graphs formalism heavily uses reification. It’s a style of modelling where a "primary" relation is annotated with additional facts, facets, details, contexts etc. Many people find that very intuitive. Contrary to n-ary relations in RDF the central relation isn’t lost in some blank node agglomeration.

* A paper on converting WikiData to RDF [0] shows how immensely helpful a sound and concise reification mechanism could be. Like a few other papers on the topic it shows how Named Graphs would be the most intuitive solution but how people resort to rather unwieldy fluents and n-ary relations because they want to keep OWL reasoning and therefor need to stay inside the realm of standardized model theoretic semantics. 
* Fluents, N-ary relations, Singleton Properties etc have been investigated as reification techniques and although they are considered inferior to Named Graphs especially in terms of intuitiveness, the latter are dismissed for having no semantics (forgetting that that is only the case for since SPARQL name squatted the term) or not being tailored to the triple use case, or both. The discussion seems to go in circles and it feels like the RDF 1.1 WG burned the ground for Named Graphs more than it cleared things up. But this is also an area where I sometimes think "Com'on people, Named Graphs DO have semantics - why don’t you just use them and stop waiting for the W3C to rubber stamp them?".

* RDF*, syntactic sugar for RDF standard reification, although based on the same (lack of) semantics got a best poster award at last ISWC. I interpret that as a sign of interest and demand.

* The Open Knowledge Network (OKN) seeks some context mechanism to drive a massive open source Knowledge Graph [1]. In Knowledge Graphs the distinction between the sole triple and the group of triples often dissipates. The LOD needs to handle and administer masses of data, hence the focus on graphs, but Knowledge Graphs combine huge amounts of data with finegrained KR and meta modelling. I come from Topic Maps and I initially despised RDF as some sort of eCommerce enabling technology with a very flat earth-ish perspective on knowledge representation. I’m quite content to finally see a massive use case for such fine grained modelling and meta modelling emerge.

So, IMO there is definitely use, need and even desperation for sound triple reification. If graph naming and contextualization really should be based on reification is of course another question.

> The entire idea of using RDF to describe RDF seems to me to be a rat-hole which is best avoided altogether.

Isn’t KR one huge rat-hole ;-) But the problems are not all technical or with the semantics. I waded through the RDF 1.1 WG archives and they contain some cool stuff like your RDFC proposal or Sandro’s idea how to distinguish names from labels. Those proposals still totally make sense. The discussions must have been strenuous but they did clarify the problem space. Maybe next time... :-)

By the way: I like the clarity of that(how IKL handles reification). From Sowa’s slides I take it that the standardization effort didn’t receive the funding required to finish, but was there any follow-up work based on IKL? Any implementation even?

> Best wishes (and thanks for the penetrating questions.)

Thanks, and thanks for relentlessly mining answers from rat holes!
Thomas

> 
> Pat
> 
>>> On 13. Jul 2018, at 23:42, Pat Hayes <phayes@ihmc.us> wrote:
>>> 
>>> On 7/13/18 12:49 PM, thomas lörtsch wrote:
>>>> I’m trying to understand what the RDF 1.1 Semantics Recommendation says about reification (*) but I’m having particular difficulties keeping up with the different kinds of triples it describes.
>>> 
>>> I will do my best to explain, but I should perhaps say up front that very few uses of reification have paid close attention to what the specs say about it. So this is more about what the WG intended, than about any actual reality.
>>>> At one point quite early in Appendix D.1 the Recommendation says:
>>>> "Reification is not a form of quotation. Rather, the reification describes the relationship between a token of a triple and the resources thatthe triple refers to."
>>>> I’m not a native speaker so some subtleties are lost with me.
>>> 
>>> Rest assured that your grasp of the subtleties is better than that of most native speakers.
>>> 
>>>> My best guess is that "token" here is meant as in type-token-distinction as at a later point the spec refers to "a particular instance or token of a triple".
>>>> 
>>> Correct.
>>> 
>>>> However if the spec refers to token as in type-token then why is the reification not describing the type but the token?
>>> 
>>> Because the IRI which identifies the reified triple has to be interpreted in this way, in most of the (actual and potential) uses of reification that were being contemplated when the spec was being written. For example, the referent of this IRI was intended to be something that could be stored in a file and transmitted from place to place, was asserted by someone, had a provenance, etc.. In other words, it must be some piece of an actual concrete ('surface' or 'interchange') syntax, such as RDF-XML or TURTLE or N-triples.
>> Okay. I think what threw me off most was when the spec says it can’t define something but then speaks about it all the same. The spec says the identification of a token by a reification is not defined (so, in my words, the relation is brittle at best). My attempts to interpret the text then assumed that it wouldn’t speak about that use case any further. But it does. Because it’s an important use case etc.
>>>> Or would the spec, because it is (I guess) referring to unstated triples here,
>>> 
>>> The intention was to be neutral as to their statedness or otherwise.
>> Aha! And I operated the whole time under the impression that honouring this distinction was the reason for all those contortions.
>>>> rather speak about (non-existing) instances than about their type? And if some triple with a specific subject/predicate/object is foremost a type how does that fit with the set semantics that there can be only one instance of that type?
>>> 
>>> ? The semantics isn't relevant here. This is really purely an issue insyntax. (Or are you referring to the 'abstract' syntax, in which an RDF graph is a set?
>> yes
>>> If so, see below.)
>>> There can of course be several instances of a triple (in different graphs, in any concrete syntax for RDF).
>>> 
>>>> In my intuition it doesn’t. The dictionary also offers "symbol" and "representation" which can mean type or instance, so that doesn’t help either.
>>>> Shortly thereafter:
>>>> "Reifications can be written with a blank node as subject, or with anIRI subject which does not identify any concrete realization of a triple, in both of which cases they simply assert the existence of the described triple."
>>>> What is a "concrete realization"?
>>> 
>>> Yes, that is awkward. I meant simply a token, in some surface syntax.
>>> 
>>>> The next sentence more specifically speaks of "a concrete realizationof an RDF triple, such as a document in a surface syntax" but does that exclude triples in databases, or only unstated triples?
>>> 
>>> No, it does not exclude them. They are after all represented in some syntactic form.
>> Just to disclose my intention: I’m pounding on this issue so much because my real interest is not provenance of some graph "realization"but meta modelling, attributed graphs and the like. That does of course happen mostly within some database and isn't overly concerned with serializations to documents. But that will be the topic of another mail.
>>>> In the next sentence:
>>>> "The subject of a reification is intended to refer to a concrete realization of an RDF triple, such as a document in a surface syntax, rather than a triple considered as an abstract object."
>>>> What is a triple as an "abstract object": a triple that merely exists(in the sense that it has never actually been stated)? Or a triple that sits in a database as bits and bytes but not "concretely realized"? Or both? Or anything but a triple that has been "concretely realized"?
>>> 
>>> OK, I will confess that there is some conceptual confusion in the veryheart of the RDF spec, and you have located it. The basic issue is the "abstract syntax" in which an RDF graph is defined to be a SET of triples.We took this path (which is highly unusual when defining languages, as you may know)
>>  Lamentably I don’t know much more than what I learned through reading the specs and going through the RDF 1.1 WG archives. I’m taking reading recommendations.
>>> in an attempt to have a cake and eat it. We wanted to describe RDF 'abstractly' so that there could be many different surface forms, including (when this was written, in 2004 – this text is copied directly from the original RDF 1.0 specification) surface forms that had not yet been invented, but we wanted to keep the specification as simple as we could, and in particular wanted to avoid an elaborate algebraic terminology ofdistinguishing 'abstract syntax' from 'surface syntax' and having to define a category-theoretic apparatus of mappings between them. For most of the development this has been reasonably effective, although it did causeus some grief regarding how to handle blank nodes; but the fact is, it is something of a conceptual muddle. What we SHOULD have done is somethinglike what is described in my invited lecture here
>>> https://www.slideshare.net/PatHayes/rdf-redux
>>> but this only occurred to me later, when it was too late to adjust thespec to conform to it. (The 2014 WG that created RDF 1.1 was prohibited by its very charter – and over my strenuous objections – from making such far-reaching changes to the underlying RDF structure.)
>> Yes, I read your mails about the process in the WG archives and I experienced those strong worries to derail uptake of RDF and the LOD effort bymaking more than the most modest changes to RDF first hand at the time. I hope that with the advent of Knowledge Graphs, Property Graphs, Attributed Graphs etc the increasing demand for and use of meta modelling techniques will bring another chance to standardize this properly in RDF.
>>> So, to return to actual RDF reification: the intention is that the subject node of an RDF reification always refers to a token – a piece of concrete syntax in some surface form of RDF, whether actually physically realized or not – and not to a set-theoretic or mathematicalabstraction.
>>> 
>>>> To sum up, there seem to exist:
>>>> - abstract triples
>>>> - concretely realized triples
>>>> - simply existing triples
>>>> - reified/described (but not quoted) tokens (of triples)
>>> 
>>> I would make a simpler distinction:
>>> 
>>> 1. abstract triples
>>> 2. tokens (of triples) in some surface syntax for RDF, for example a line of three IRIs followed by a dot, in N_triples. I would make this category as inclusive as possible, including such things as rows of a table if that table is understood to encode RDF.
>>> 
>>> The second category can, if one wishes, be further subdivided into tokens which are physically instantiated at some time, versus those which are not: but a similar distinction can be made within any class of tokens, so this is nothing special to RDF.
>>> 
>>> The RDF spec is almost entirely concerned with 1, and deliberately avoids the topic of 2, but reification has to be understood to refer to 2, hence the awkwardness you have noticed.
>>> 
>>>> Which of them has actually be asserted somewhere, somehow. Does it make a difference how "concretely" it has been "realized" - in a database, serialized to a turtle document, etc?
>>> 
>>> No.
>>> 
>>>> Why does reification refer to a token, not the type?
>>> 
>>> Because nobody felt any need to be able to talk about triple types, but there was a strongly felt need to be able to talk about triple tokens. And we had to make a call one way or the other, as it would have been totally confusing to have left this open.
>> Okay, and I always assumed that _this_ can’t be the answer to my question as the connection between token and reification is so brittle (leading to my questions above on that topic).
>>>> What is that token exactly?
>>>> 
>>>> It might be more useful if the spec differntiated just between thingsthat can be asserted and things that actually have been asserted.
>>> 
>>> That might be interesting, but it is orthogonal to the distinction we were trying to draw.
>>>> Then some triple with a specific subject/predicate/object exists onlyonce as something assertable, but it can be asserted many times (and each time may have an identifier, provenance etc).
>>> 
>>> Yes, exactly. If we presume that assertion must involve actual piece of surface syntax being asserted, then this is exactly the distinction between abstract and concrete that we were trying to explicate.
>> Cool!
>>>> But back to one last question:
>>>> "[…] asserting a triple does not automatically imply that anytriple tokens exist in the universe being described by the triple. For example, the triple might be part of an ontology describing animals, whichcould be satisfied by an interpretation in which the universe contained only animals, and in which a reification of it was therefore false."
>>>> That doesn’t look like anything to me…
>>>> Does this suggest that a triple token is the real world realization of whatever the triple is refering to? I hope not, but I’m lost here anyway.
>>> 
>>> Perhaps this example wasn't very helpful.
>>> 
>>> Let us agree that a reification of a triple, when asserted, says that a token of that triple exists, but it does not assert that the triple being described is true: it does not assert the triple that it describes. What this paragraph is trying to say is that the dual is also the case: that if you were to assert a triple, that does not in itself also assert that a potential reification of that triple is true. This might seem counter-intuitive: after all, the asserted triple does exist, or you couldn't have asserted it. But the point is that the RDF graph containing the triplemight be describing an ontologically limited 'world' which need not itself contain triples as entities. Put another way: the universe described by an RDF graph is not required, by the RDF specification, to contain the triples of that graph itself.
>> I see - luckily that seems to be quite a corner case.
>> Thomas
>>> I hope that makes sense; but if it doesn't, I don't think much will turn on whether one follows it or not :-)
>>> 
>>> Best wishes
>>> 
>>> Pat Hayes
>>> 
>>> 
>>>> Best,
>>>> Thomas Lörtsch
>>>> (*) not because I want to use it but because I want to precisely understand its semantics (or lack thereof)
>>> 
>>> -- 
>>> -----------------------------------
>>> call or text to 850 291 0667
>>> www.ihmc.us/groups/phayes/
>>> www.facebook.com/the.pat.hayes
>>> 
>>> 
> 
> -- 
> -----------------------------------
> call or text to 850 291 0667
> www.ihmc.us/groups/phayes/
> www.facebook.com/the.pat.hayes


[0] Hernández, Daniel, Aidan Hogan, and Markus Krötzsch. "Reifying RDF: What works well with wikidata?." SSWS@ ISWC 1457 (2015): 32-47.
[1] http://wiki.knoesis.org/index.php/CKG2018
Received on Friday, 7 September 2018 10:09:32 UTC