Re: ISSUE-30: How does SPARQL's notion of RDF dataset relate our notion of multiple graphs? from Pat Hayes on 2011-04-19 (public-rdf-wg@w3.org from April 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 18 Apr 2011 23:32:58 -0500
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Ivan Herman <ivan@w3.org>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <577FEF84-8219-44B2-A4C1-68E9B711338E@ihmc.us>
On Apr 18, 2011, at 5:06 PM, Richard Cyganiak wrote:

> Hi Pat,
> 
> I'll push back. I understand the value of the RDF Semantics document in that it defines the valid entailments of an RDF graph. I still do not understand what role it has beyond that, and how it is relevant to operations that do not involve entailment.
> 
> (Just to keep the underlying question in mind: We disagree on the question whether the RDF Semantics document needs to mention aspects of RDF that do not involve entailment. I think it doesn't have to, and shouldn't. You disagree.)

Well, Im not sure what this thesis of yours even *means*, so I don't know whether I agree or not. What you are saying *seems* to be simply based on a failure to grasp the basic ideas of semantics, frankly. Sorry to be blunt, but thats the way it seems. I get the impression that you don't know squat about semantics, find documents which refer to it hard to read and understand, and would prefer to not feel that you are missing something important by the handy device of writing this hard stuff out of the picture. Which would be fine if there was a viable alternative; but there really is not. You can try to establish semantics by appealing to inference rules, but that just kicks the issue down the road: what determines the semantics of those rules? Eventually, you have to cash this out in some model theory, because there isn't any other theory of truth available. 

OK, others have suggested presenting RDF semantics in a more axiomatic, rule-based way. I have myself (see the L-base proposal.) But now, will your audience of pragmatic hackers whose eyes glaze over when they try to read the RDF semantics document fare better when they are asked to read first-order logic? When the spec has to require about one year of postgraduate education in mathematical logic before you can even understand the words it uses?

Your focus on entailment misses the point. Entailment is a byproduct of the more fundamental idea of truth in an interpretation, which is the core of the model-theoretic semantic idea. (BTW, entailment is *defined* this way. P entails Q *means* that Q must be true when P is true; and to make sense of this, you have to ask what it means to be true...)

> 
> On 17 Apr 2011, at 00:10, Pat Hayes wrote:
>>> My understanding is that the RDF Model Theory exists to define which inferences are valid, given an RDF graph. What other purpose does it serve?
>> 
>> It defines what RDF means.
> 
> No, it doesn't, and I think this is not a very honest statement. If the MT defined what the RDF language means, then it would have to define what an utterance in that language means. The meaning of RDF is in all the weird and wonderful technological, social and economic processes and conventions that establish, more or less reliably, the referents of URIs.

That is central, indeed, and the model theory does not do that, indeed. But it starts from there and runs the rest of the way. An interpretation is defined by exactly this mapping, of URIs to their referents. But then you have to explain how this reference mapping determines the truth-value of triples and graphs. I know this is pretty simple, and RDF model theory is very simple (as model theories go), but it is not entirely trivial - it requires that little trick of using a REF mapping, to handle the fact that RDF allows a name to occur in both relational and argument position -  and when you add in datatypes and ill-formed literals and all the other stuff, it turns out to be not only not trivial but in some cases quite tricky. And more to the point, it settles issues which need to be settled, but which were not even thought of until we forced ourselves to write a model theory and stick to it. If we had just written out rules that seemed kind of intuitive and neat, we would simply not have noticed these issues, and they would be causing interoperability problems as we speak. (Some engines would normalize illformed literals to a standard form, others would issue an error, others would declare they are not in rdfs:Literal, etc. etc. OWL would not know if RDFS was extensional or not, etc.) We did leave a few loose ends (the equivalence of plain and xsd:string literals is one glaring example) and in *every* case where we did, these troubles have appeared and have needed to be cleaned up. Which is one reason why this very WG exists, in fact. 

> Those processes are not described in the Model Theory (or, for the most part, in any other W3C Recommendation), to the contrary, the MT explicitly punts on most of them. The MT provides some icing on top of that magnificent mess.

One way to put it, yes. But "punt" is not a fair comment. Model theory is not a theory of reference; it is about how the meanings of larger expressions are built up from their subexpressions. 

> 
>> Or to be achingly precise, it puts constraints upon what RDF can possibly mean.
> 
> I guess that's a fair characterization.
> 
>> To give just two examples, it implies that the truth of an RDF triple cannot depend upon the form of a URI (other than by this form changing what the URI denotes) and it specifies that any URI must be interpreted as referring to the same entity every place it occurs.
> 
> The first of those I don't understand.

Well, it says that the way a URI in a triple contributes to the truth of the triple is through the *referent* of the URI, not the form of the URI itself. So when we use a URI in a triple, we aren't talking about the URI. See the current email discussion with Ivan for the relevance of this.

> The second I don't think is quite true, as the spec is only concerned with *single RDF graphs*, and AFAIK specifies nothing regarding the interpretation of URIs in different graphs.

The interpretation mappings are defined on a vocabulary, not on a graph. The basic idea is a set of names are assigned meaning (referents) by the interpretation, and then the semantics takes over and determines the truth-values of all graphs *written using the vocabulary*. The document regularly refers to sets of graphs, for example in its definition of entailment. 

(BTW, I would like to change the definition of interpretation mapping so that they always apply to *all* URIs, so all interpretations are 'global'. THis would remove the need to constantly talk about vocabularies, which is a relic from logic textbooks that we really don't need for the Web. It also has a few knock-on simplifications for the more exotic semantic stuff in OWL. I simply had not thought this through when writing the document.)

>> These constraints on meaning apply to any RDF processing, not just to entailment checking. SPARQL for example satisfies semantic conditions which are related to the RDF semantics. 
> 
> Can you give me an example of such a semantic condition satisfied by SPARQL that is not covered by entailment?

I was thinking of entailment, indeed. SPARQL behavior changes when different entailment regimes are invoked. But entailment is an essentially semantic notion. 

>> And I insist that this - the semantics of the triples - is not something that can be ignored while conforming to the RDF specs.
> 
> You say that to conform to the RDF specs, one must not ignore the semantics of the triple. What does this mean, ignoring the semantics of the triple? How can I tell wether I'm ignoring the semantics of a triple or not?

Well, take Ivan's proposal as an example. He wanted to be able to write

<g> rdf:type G-box
and also
<g> rdf:tags <h>

but the first <g> refers to the box, and the second <g> refers to the URI itself. At which point, the RDF semantics will tell you that you can't do both of these. 

> Does the Model Theory gives rise to any test cases or conformance criteria?

The test cases are phrased in terms of entailments and non-entailments, in order to be 'cases'.  But we, the WG who figured out these test cases, did that job in almost all cases by thinking about the model theory. You pretty much have to in order to get the entailments right. 

> This is a honest question.
> 
>> Of course, the specs can be ignored, and no doubt often are. But our job is to write the specs., so we are rather obliged to take them seriously.
> 
> Rest assured that I am taking the specs seriously. But I am also doing my best to take the users of RDF seriously. It is my belief that if they often ignore the specs, then we should be open to the possibility that something is wrong with the specs, and maybe it can be fixed.

I agree. 

>> Well, if that is your view, then by all means let us as a WG declare that RDF has no normative model theory, and is simply a meaningless notation.
> 
> XML and JSON and CSV and the relational model don't have normative model theories. Does that make them “meaningless notation”?

In the case of XML and JSON, yes. (I do not know enough about CSV to respond.) In case you find this answer ridiculous, let me ask you: if I give you some JSON which is claimed to describe some real-world data, and all you know about it is that it is JSON, how will you know what the JSON is actually saying?

The relational model does have a model theory, courtesy of Codd. Logic programming systems also have model theories. 

>> I will be happy to go along with this, which might surprise you. But we should not give our notation a normatively defined semantics and ALSO say that this semantics should be ignored in practice.
> 
> I pointed out that it *is* often being ignored in practice. I am not saying that it *should* be ignored in practice.
> 
> I am, however, saying that we should have agreement about the role that the Model Theory in the Semantic Web project. I do understand that the MT allows us to derive the entailment rules. I do understand that it allows us to do semantic extensions in a principled and formally correct way. I do understand that it serves as the underpinning of OWL (whether they like it or not). And from this understanding of the role of MT I do not see how it follows that not mentioning RDF datasets in the MT would be a problem.

If we are very careful to avoid making unwarranted assumptions about meaning, it might well not. ( I did vote for the decision, you may recall.)  But it is perilously easy to slip into making those assumptions. I would claim, it is almost impossible to avoid doing so. The document in the Wiki itself slips over the semantic edge, as I have already pointed out. 

Basically, it is in the long run easier and simpler to keep all this machinery semantically coherent. As soon as we start splitting up RDF into bits and pieces that are semantically unrelated to one another, we will start getting clashes between these disparate pieces. It is just bad methodology to deliberately insert semantic breaks into a notation whose main, arguably only, purpose is to be semantically coherent. 

>> I suggest, in all seriousness, that you put this forward as a WG issue: propose that RDF be declared to have no normative semantics at all. At the very least, the resulting debate might get some issues out into the open air. 
> 
> Look, Pat. I like having normative entailment rules written down. I like having normative axiomatic triples. I like having normative text about the treatment of blank nodes. I like having normative text that explains how datatypes work. All of these things give rise to conformance criteria and can be written down in test cases and lead to observable behaviour in software implementations and validatable criteria in published data. But I don't believe that writing down all these nice things in a certain mathematical notation imbues magical properties on RDF and constitutes the difference between RDF being “meaningful” and “meaningless”. As I see it, RDF is a data model like any other, but it comes with quite a sweet set of inference machinery that happens to be written down in a manner that is rather quirky and impenetrable (for the intended target audience, which should be implementers of RDF-based systems and authors of RDF data).

First, and most importantly, RDF does *not* come with inference machinery, sweet or otherwise. The RDF specs quite deliberately do *not* specify *any* inference machinery. (Why not? First, because there are a whole lot of different ways to build such machinery, which might need to be optimized for different purposes, and the spec should not impose one decision. Forward entailment rules is one idea. Tableau-style consistency checking is another. Logic programming-style rules are yet another. With luck, someone might invent a new one that works better on billions of triples. What these have in common is exactly that they are valid with respect to the normative model theory, and that is *all*; and that is all that they *need* to have in common. Second, because no amount of machinery specification is going to provide the kind of semantic account that is necessary to define such things as semantic extensions to RDF. And finally, because all that hypothetical machinery would itself need to be given a semantics, eventually.)

When you say 'how datatypes work", what exactly do you mean? Dataypes aren't code, they don't *do* anything. Their relevance to RDF is in what they mean, not what they "do". If you like, you can cash this out in terms of entailments, but that is at best an awkward way to do it; and I will buy you or anyone else a good steak dinner if you can come up with a complete set of entailments and non-entailments without considering the model theoretic notion of truth. 

The mathematical notation doesn't imbue anything.  (I did my damndest to avoid mathematical notation when writing the RDF document, by the way: some of my colleagues were horrified by the lack of greek letters.) The actual mathematics (not the notation) imbues only a useful degree of precision. (This mathematics really is pretty simple: sets and mappings, one level deep. If someone can't follow that, then they probably cannot understand RDF itself. The XML schema specs are *way* more complicated than the entire RDF spec document suite. Can you tell me what a 'facet' is in one tweet?)  And the 'magic' is only that it deals directly with the central idea of truth. I will claim that any semantic account that is worth having must somehow do this, and that without this the notation really is meaningless. To publish some data in RDF is to *claim that the RDF is true*. If the spec does not say what this claim means, then yes, publication of RDF is meaningless. 

Now, in order to avoid starting a philosophical war, I would agree that we need to find ways to convey the essence of RDF to a wider audience. If model-theoretic ideas really are as hard to grasp as you claim, then perhaps we can find some other way to present the ideas to that audience. But I still think that we need to have a model theory as a normative reference. It is just too useful to do without. The pedagogic task needs to be kept distinct from the foundational role of a truth-based semantics. 

Pat

> 
> Best,
> Richard

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Tuesday, 19 April 2011 04:33:31 UTC