Re: RDF Semantics - Intuitive summary needs to be scoped to interpretations (ISSUE-149) from Pat Hayes on 2013-10-20 (www-archive@w3.org from October 2013)

From: Pat Hayes <phayes@ihmc.us>
Date: Sun, 20 Oct 2013 03:31:30 -0500
To: David Booth <david@dbooth.org>
Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>, www-archive <www-archive@w3.org>, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, Ivan Herman <ivan@w3.org>, Sandro Hawke <sandro@w3.org>
Message-Id: <3522827B-101A-4830-8A42-13F4083F0819@ihmc.us>
David, greetings.

Most of what you write in this message is completely uncontroversial and I would entirely agree with it. Rather than respond point by point, let me try to summarize.

1. People who publish RDF (or indeed any other) content may have different ideas about what IRIs mean, and the readers or users of this data may also have different ideas about what the IRIs mean. Call this "mismatch". 
2. Even when the publishers and users of RDF share a common understanding of what IRIs mean, the actual RDF will not be enough to formally pin down this mutual understanding, so that the RDF (considered in isolation from other possible sources of meaning) will be satisfied by 'nonstandard' interpretations which do not conform to this shared mutual understanding. Call this "underdetermination". 
3. In some cases, the difference referred to in (1) may be so great that different pieces of published content are mutually inconsistent. Let me call this "divergence". 
4. It is also possible that two publishers of RDF content might have perfectly aligned notions of what all the IRIs mean, but simply disagree concerning the facts. Call this "disagreement".

I have deliberately avoided the word "ambiguity", because it is ambiguous. You and I agreed long ago that RDF – probably all data on the Web – is inherently ambiguous in the strict sense that it does not pin down a unique satisfying interpretation, ie it is underdetermined. We agreed that some of the TAG publications on "uniqueness of identification" were conceptually faulty in the way they were worded, since they seem to suggest that this unachievable goal is necessary to Web operation. Underdermination is indeed inevitable. But "ambiguity" can be taken to imply mismatch, and this is *not* inevitable. And even a mismatch does not inevitably lead to divergence, or to any detectable inconsistencies between different usages of an IRI. 

Divergence and disagreement are formally indistinguishable: they both give rise to contradictions. For example, 
Alice publishes 
   Everest was first climbed in 1953
Bob publishes
   Everest was first climbed in 1954
and with enough extra stuff about uniqueness of dates of first climbs, we can derive a formal contradiction, let us suppose. Now, 
it might be that Bob is using "Everest" to refer to K2, in which case we have divergence; or he might just be wrong about the date Hilary and Tensing made their historic climb, in which case we have a disagreement. In the first case, both Alice and Bob have their facts straight, but they are struggling over the referent of a name; in the second case, Alice is right and Bob is wrong, but at least they both know what they are talking about. Model-theoretic semantics isn't able to usefully distinguish these two cases: all it can tell us is that the things that Alice and Bob actually publish are (with some extra assumptions) mutually inconsistent, for some reason. It does not tell us what the reason is. 

So, to sum up: published RDF content typically (perhaps always) has many satisfying interpretations, ie it underdetermines its intended meaning. Also, RDF from multiple sources may be mutually inconsistent, ie be such that no interpretation satisfies it all. There can be several reasons for this, including divergence of intended meanings of IRIs and simple factual disagreements. But note that when an inconsistency is detectable between what Alice and Bob publish, then *something* is not right about that mutual publication. Either they disagree about the facts of the matter, or they disagree about what IRIs denote, or they have mutually incompatible ways of describing the world. I do not mean to imply that one of them is wrong and the other right (though that may be likely), only that they do actually in some way clash in what they are saying. As a consumer of their data, I would be obliged to choose between them, to make decisions about what to accept and what to reject. 


The intuitive picture (not part of the normative semantics document, but intended to be understood by readers) is that the actual world being described by RDF data is itself one of the interpretations, and that the bare word "truth" – as when we might say, yes it is *true* that Everest was first climbed in 1953 – refers to this real world, but uses the same recursive analysis of how truth is determined from a bare interpretation mapping – the same "truth conditions". Such a picture is an integral part of how to relate the model theory to other semantic conditions on RDF, such as those arising from connections between RDF data and natural language texts or images. But as I say, this is not part of the normative RDF semantics, which is solely concerned with defining entailment relationships between RDF graphs. 

OK so far? Because all of this is how the RDF semantics views the world of RDF Web publication. I have used the terms 'satisfy', 'interpretation' and 'inconsistent' here exactly as they are defined in the formal semantics.

Now, you seem to want to insist that there is something else, some other way to use the formal semantic machinery, which somehow goes beyond or provides some kind of alternative to this picture. Can you say what it is, without using meaningless rhetoric such as "single-interpretation assumption" or "agnostic" ? What is this "other valid way" to think about the RDF semantics?

Pat



On Oct 19, 2013, at 10:56 PM, David Booth wrote:

> Hi Pat,
> 
> On 10/10/2013 02:05 AM, Pat Hayes wrote:
>> [ . . . ]
>> But, as I say, I now think that this idea, of trying to connect the
>> formal notions to an intuition, was probably a mistake in this
>> document and went against the spirit of a WG decision.
> 
> I don't know what was the WG decision, but formal specifications of any significant size almost always benefit from helpful informal guidance that gives insight about how they are intended to work.  This is analogous to the role of good comments in code when writing software. But I'm okay with it being deleted if you want.
> 
>> 
>> Pat
>> 
>> <<Other in-line responses, below, are part of our continuing, um,
>> debate, and are aside from discussions of the RDF documents.>>
>> 
>>>> On Oct 4, 2013, at 10:51 AM, Peter Patel-Schneider wrote:
>>>> 
>>>>> In my opinion the divergence boils down to Pat believing that
>>>>> this informative section should be more informal and David
>>>>> believing that it has to be more formal.
>>> 
>>> I don't exactly think it has to be more formal, but just that: (a)
>>> it needs to mention interpretations, because that concept is so
>>> central to the formal semantics; and (b) the statement about the
>>> conditions under which a graph is true *needs* to be scoped to an
>>> interpretation to make any sense at all.
>> 
>> That is exactly what it should *not* be, in order to convey the point
>> it was intended to be conveying.
> 
> The point to which you allude appears to reflect a particular intuition, but apparently I don't agree that that is the only valid intuition that is supported by the mathematics.  More on this below.
> 
>> 
>>> If one talks about a graph being true, without mentioning an
>>> interpretation, IMO the most sensible way to understand such a
>>> statement is to take it as meaning that the graph is *satisfiable*
>> 
>> No, that is not the right way to understand it. Truth and
>> satisfiability are not the same thing at all. (That pigs can fly, is
>> satisfiable.) To say that a graph (or any other assertion or
>> sentence) is true, is to say that when it is interpreted *in the
>> actual world*, its truth-value is true.
> 
> There are multiple problems I have with that last sentence.  First of all, AFAICT the formal semantics makes no claim whatsoever about the real world: the semantics leaves it up to the user to choose interpretations.  Second, the phrase "*the* actual world" betrays an assumption that there exists only *one* valid interpretation -- that "single-interpretation assumption", as I've been calling it -- whereas AFAICT the formal semantics makes no such assumption.
> 
>> That is the pre-theoretic,
>> intuitive, notion. Someone says something, you figure out *what* they
>> are saying, and you judge whether it - what they are saying - is
>> true. Nothing in that account mentions interpretations. It does
>> mention, implicitly, the truth conditions (section 5) and we could
>> say that it *presumes* an interpretation that the speaker and hearer
>> have in common.
> 
> Ah, now we're starting to get closer to the heart of the issue.  I'll come back to this below.
> 
>> And that is where the naivitée of this naive account
>> is displayed, of course, that implicit assumption of a common
>> interpretation; because when we have the kind of distancing between
>> publisher and reader that is inevitable on the semantic web, and
>> communicate using IRIs which have no assumed common background of
>> linguistic meaning, we cannot presume this common shared
>> interpretation, this "common ground"
>> (http://semantics.uchicago.edu/kennedy/classes/f07/pragmatics/stalnaker02.pdf,
>> or
>> http://plato.stanford.edu/entries/discourse-representation-theory/.)
> 
> Agreed.
> 
>> So this is where the interpretation idea comes in, because we have
>> to, as it were, survey the possible things you might mean when you
>> publish some RDF. We don't know what world you are talking in, so we
>> have to consider all *possible* worlds. Which is what interpretations
>> are (the thin, pale shadows of formalizations of).
> 
> Yes, excellent so far.
> 
>> 
>> Long - very long - story short, the analysis of real linguistic
>> communication - including Web communication - between cognitive
>> agents (people, mostly) involves model-theoretic ideas, but it also
>> involves a *lot* more. RDF, indeed the entire semantic web, is a tiny
>> part of this larger picture, and can be fitted into it in one small
>> corner. But in order to be useful, it does need to be fitted into it
>> accurately.
>> 
>>> : that there *exists* an interpretation under which the graph is
>>> true, and hence we can take the graph as being true. (Conversely,
>>> if the graph is not satisfiable then we cannot take it as being
>>> true.)  OTOH, such a statement could be taken to mean that the
>>> graph is true **in some unspecified interpretation**
>> 
>> The one that is presumed when we talk (pre-theoretically) about what
>> people are referring to when they say "Everest" (for example), and
>> when we make judgements of the truth or otherwise of their utterances
>> in the actual, real, world we are all talking about. Yes, exactly.
> 
> First of all, I really like this explicit distinction between the pre-theoretic or real world notion of truth, and the truth value that is assigned to an RDF graph by the formulas in the formal semantics.  That helps the discussion.
> 
> With that on the table, although real world truth should be a *goal* -- just as one resource per URI should be a goal -- I don't believe that it is the right criterion for making engineering decisions in the semantic web world.  Rather, *usefulness* is the more relevant criterion by which we should evaluate our engineering trade-offs when designing the semantic web.  This will take more explanation, so I'll attempt to provide that.  But with respect to interpretations, this translates into the notion that a more agnostic view toward interpretations should be taken, rather than making the single-interpretation assumption that always attempts to understand every RDF utterance in terms of a single notion of pre-theoretic real world truth.
> 
> Now to attempt to explain.  First of all, note that there is nothing whatsoever in the mathematics that limits us to a single interpretation: the mathematics works perfectly fine without modification whether we eventually talk about one or more than one interpretation.   I've pointed out several times that it is perfectly possible to have two interpretations I1 and I2 and two graphs G1 and G2, such that I1(G1)=true and I2(G2)=true (in the non-pre-theoretic sense), whether or not these graphs share some of the same URIs.  "So, what of it?" you may ask.  I'll get to that.
> 
> The second point to observe is that different graph authors have different interpretations in mind when they write their graphs.  This can be either conscious or unconscious.  Although I agree that there is a single notion of pre-theoretic truth in the real world, different people have different -- and sometimes *very* different -- ideas of what that single truth is.  Correspondingly, they also make different assumptions about the resource to which a given URI maps, within those interpretations.  Again you may object and assert that if they are making different assumptions then one or more of them should be considered wrong.  But again, as I've tried to point out, such a requirement is not generally *possible* to obey.
> 
> Asssuming that a URI owner has the right to say what resource his/her URI denotes (as described in the Web Architecture), there are several reasons why different well-intentioned URI users may make different assumptions about the identity of a URI's resource:
> 
> 1. The URI owner may not know or may not understand a particular resource distinction that matters to some user of that URI.  We cannot expect every URI owner to be omniscient about his/her URI's resource.
> 
> 2. The URI owner may not care about a particular distinction.  We cannot
> expect the URI owner to have the same concerns as all users of the URI.
> 
> 3. The URI owner may *intend* the URI definition to be ambiguous to some
> degree, so that the URI can be used in a wider variety of ways.
> 
> 4. The URI owner may not be reachable to clarify a particular point of ambiguity.
> 
> 5. The URI owner may want to keep the resource definition simple,
> without cluttering it up with distinctions that 99% of the
> URI's target users would not care about.  Complexity has a cost.
> 
> 6. The URI owner may not wish to expend the resources necessary
> to figure out what finer distinctions might be made.
> 
> 7. When a URI definition is provided in a machine processable form such as an RDF graph -- and that of course is the point of the Semantic Web -- it is generally not possible to make that definition unambiguous.
> 
> So the reality is that different authors *do* make different assumptions about the resource denoted by a particular URI.  This is very neatly captured by the notion that different authors have different sets of intended interpretations in mind when they write their graphs.  In other words, when an author writes an RDF graph, the author's intended meaning of that graph does *not* generally boil down to a *single* interpretation, but an ambiguous *set* of interpretations, all of which are licensed interpretations falling within the author's intent.
> 
> This leads to the question of what exactly are the author's intended interpretations for a given graph.  That of course may be hard to know -- just as it may be hard to know what *single* interpretation the author intended if one assumes that the author only intended a single interpretation.  But given that the author could (in principle at least) if desired supply whatever constraints he/she chooses as triples within the graph, a reasonable assumption is that the intended interpretations are the satisfying intepretations of the ontological closure of that graph.  (By ontological closure I mean the union of the graph with the transitive closure of the URI definitions for the URIs within the graph.)  This makes for a very "what you see is what you get" notion of the intended interpretations, and I will note that it has the further advantages of: (a) removing nearly all "then a miracle occurs" steps
> http://blog.stackoverflow.com/wp-content/uploads/then-a-miracle-occurs-cartoon.png
> in the determination of interpretations; and (b) being completely aligned with the intent of the Semantic Web of facilitating machine processing.
> 
> In other words, to my mind the notion of interpretations provided by the RDF Semantics aligns very well with: (a) the inescapable ambiguity of resource identity; and (b) the fact that people do *not* have the same view of the world, nor do their software applications have the same view of the world.
> 
>> 
>>> .  But that would be a very bad way to write
>> 
>> Try telling that to linguists. Or to literary theorists, or
>> historians, or philosophers of language, or indeed pretty much anyone
>> who uses language professionally. Not only is this not a bad way to
>> write, its the ONLY way to write if we are trying to anchor model
>> theory in an intuitive description of how communication actually
>> happens.
> 
> I can't comment on literary theory or such, but to my mind, in formal semantics, variables should *always* be bound.
> 
>> Except, calling the actual world "unspecified" seems a
>> little strange.
> 
> Amusing.  :)  But I don't view the actual world as being very relevant to the semantics, perhaps because I have a different intuitive view of the semantics than you do, as I tried to explain above.
> 
>> 
>>> , because the interpretation under which the graph is true would be
>>> an implicit unbound variable, which as we all know is a big no-no.
>> 
>> It is implicit, yes, but I don't know what kind of assumptions you
>> are appealing to by calling this a big no-no. Contexts are usually
>> implicit, right?
> 
> Yes, but we try hard to make them explicit, especially in formal specs.
> 
>> 
>>> Instead, the problem can be easily solved by adding "under a given
>>> interpretation" to the sentence.  (Of course, the notion of an
>>> interpretation should first be explained.  But that is a different
>>> omission that should be addressed anyway.)
>>> 
>>> And regarding this:
>>> http://lists.w3.org/Archives/Public/public-rdf-wg/2013Oct/0079.html
>>> 
>>> 
> [[
>>> I know, from extensive off-line email discussions with David, that
>>> he does not properly understand the intuitive foundations of
>>> semantics in any case, so I am not inclined to accept his rather
>>> condescending advice. ]] (Wow, you're calling *me* condescending,
>>> after repeatedly telling me to "go read a book"???)  That's both:
>>> (a) quite a projection; and (b) *really* unfair and unhelpful.
>>> Fortunately I'm thick skinned and I have a good sense of humor.
>>> :)
>> 
>> Well, you weren't meant to read that, obviously. But my dear fellow,
>> *have* you read the books, in fact?
> 
> I've read what I could find on the web on model theory, but not books. The best resource I've found has been the Stanford Encyclopedia of Philosophy, which I like a lot:
> http://plato.stanford.edu/
> For example, here is their entry on model theory, which corresponds beautifully with your explanation:
> http://plato.stanford.edu/entries/model-theory/
> Incidentally, I wrote to the author of that particular article for some minor clarification, and he was quite nice and confirmed a particular point of understanding about interpretations.  If there are other references on the web that you'd suggest, I would certainly be interested in looking at them.  But thus far, all that I have read has confirmed the understanding that I initially got from your writings, which I've found most informative, BTW.
> 
>> Is it really condescending for me
>> to suggest that you might want to read up something a little more
>> extensive than a few paragraphs that I wrote about RDF,
> 
> I certainly have done so, and read it quite carefully too.
> 
>> before
>> claiming that you have discovered a new way to understand model
>> theory, or setting out to correct my misunderstanding of it,
> 
> NEVER have I made any such claim.
> 
>> or
>> telling me that my perspective is too limited? I don't mean to pull
>> rank on you here, but I have been studying this stuff now, as well as
>> teaching it, for about 40 years. For a few years, I invented new
>> model theories for a living. God knows there are a lot of things I
>> don't fully understand, but model-theoretic semantics is one topic I
>> really do have pretty thoroughly grokked.
> 
> Okay, stop right there.  Clearly you have grossly misunderstood my intent, as I have never once questioned your understanding of model theory, nor have I made any claims of discovering a new way to understand model theory or any other such grand claims.  All I have done is tried to point out that, **based on the mathematics given**, there is another valid way to think about the RDF Semantics.  Furthermore, AFAICT it is a *useful* way to think about the RDF Semantics, as it helps explain real world use of RDF in a way that is not explained under the single-interpretation assumption.  It is not in any way intended to extend model theory or make any grand claims about any new discoveries.  It is just a simple and straightforward way to use the semantic formulas defined by the RDF Semantics that perhaps uses a slightly different intuition of what they mean.  Formulas are formulas and can be viewed in different ways.  Although many people may think intuitively of E=MC^2 as meaning that matter can be converted to energy, it can also be just as well taken to mean that energy can be converted to matter.
> 
> I hope this helps to clarify my intent.
> 
> Thanks,
> David
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 home
40 South Alcaniz St.            (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile (preferred)
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Sunday, 20 October 2013 08:32:05 UTC