Re: New issue - Meaning of URIs in RDF documents from Tim Berners-Lee on 2003-07-18 (www-tag@w3.org from July 2003)

From: Tim Berners-Lee <timbl@w3.org>
Date: Thu, 17 Jul 2003 23:34:55 -0400
To: pat hayes <phayes@ihmc.us>
Cc: www-tag@w3.org
Message-Id: <CCD886F9-B8D0-11D7-B920-000393914268@w3.org>
>> 1. " each URI
>> identify one thing ("Resource": concept, etc)."
>
> Exactly what is meant by "identify" here is not exactly clear, but if 
> this means something  close to what it usually means then it is simply 
> untenable to claim that all names identify one thing.
>

> I am making the claim only for RDF statements in a global context, in 
> for example an email sent between two people who don't know each other 
> but both access to the web.
>
>
> So am I: and I insist that this stipulation of identifying one thing 
> isn't sensible or even desireable. Well, at least, unless that word 
> "identify" means something different from "refer to" or "name" or 
> "denote" .

I think "denote" probably matches.  I will try to use "denote".

>   What might indeed be true is that in many circumstances, a URI 
> somehow provides access to information which is sufficient to enable 
> someone or something to uniquely identify a particular thing (that the 
> representation accessed via that URI is in some sense about), but even 
> there the thing identified might vary between contexts (such as when 
> we use someones email address to refer to the person) without harm.

This depends on what you mean by "contexts".  If you mean that I can 
send one person an email saying (in RDF)
   <http://example.com/foo.rdf#bar>  pantone:color  "blue426" .
and it can mean one thing and I send it to another person it can mean 
something else,
then we do not have system of communication which has any properties at 
all.

> This kind of ambiguity resolved by context is at the very basis of 
> human communication: it works in human life,

Yes, with natural language and peotry


>  it works on the Web,

Yes, when the genre is natural language and peotry, not mathematics,

>  it will work on the semantic Web.

No.  We are defining the semantic web NOT to work like natural language,
but to work like mathematics.

And it does not work in math.

Suppose I give you two facts, that x=1 and that x=0. Not a problem, if
one can assume the x denotes something different in the two cases.
But very hard to build any logic at all.

> Why do you want to try to legislate it out of existence?

Any system of mathematics has to be able to use symbols to denote things
in the universe of discourse.  You as a philosopher can perhaps handle
a mathematics in which symbols denote whatever anyone likes at any 
point,
but I as an engineer find it less useful.

>  You will not be able to, any more than you will be able to stop 
> people falling in love.

Ah, but people have stopped falling in love. Look up ... one by one the 
stars are going out. ;-)
But seriously...

> All that your 'ideal design' will accomplish is to make the 
> architectural pronouncements of the W3C more and more out of line with 
> the way that the Web is actually being used by real people.

People are not using the semantic web now.
There is not very much global math on the web.
People use document identifiers as though they (in some sense) will from
week to week denote (in some sense) the same thing, so people are very
used to having a global space of identifiers.

> Take your example of person A emailing person B, who A does not know. 
> What is actually going on here, described precisely, is surely that A 
> knows that 'B@Bsplace.org' is a character string which when used in a 
> certain way will  (by some occult technical means about which A need 
> know very little) act as an address, so that email sent to that 
> address will arrive in the inbox of, and likely be read by, someone 
> called 'B'.  I phrase it thus because it might be potentially 
> misleading to just say 'read by B' since that could be understood as 
> saying that A knows the referent of that name 'B'; but we are assuming 
> that A doesn't. So what A knows is in fact an existential: that a 
> person called 'B' *exists* who will get the email.  Since A knows that 
> the email exists and is unique - A has direct acquaintance with the 
> email, having written it - this is enough for A to know that there is 
> a single person out there who will get the email.  But it is still 
> misleading to say that the email address "identifies" B: if that 
> really were true, then A could find out who B was just by looking at 
> the email address.

This is rather tangential.  My example was of people emailing each 
other, and the content of the email having the semantics to A as to B.  
You discuss who is denoted by "B@Bsplace.org", an email address. The 
email address, (or we could tak of the related URI, 
"mailto:B@Bsplace.org ") in the semantic web, denotes, formally, 
something often referred to as an "RFC822 mailbox", and which is a 
conceptual thing to which  mails may be sent to or from, among other 
uses.  There is a relationship, one of whose URIs is
http://www.w3.org/2000/10/swap/pim/contact#mailbox
which relates a social entity (for example, a person) to one of these 
mailbox things (as a mailto: resource). People often use the 
approximation that contact:mailbox is inverse functional,
allowing them to determine that two people are the same person because 
they have
contact:mailbox, but that does not mean that in the formal system we 
are building to represent all this, that the "mailto:B@Bsplace.org" 
denotes the person.


> And I am describing, if you like, a perfect platonic design, to which 
> we can aspire, though social and engineering factors limit our ability 
> to implement it perfectly.
>
> Allowing - no, admitting the existence of - referential ambiguity is 
> not an imperfection: it is a basic property of communications of 
> belief using language, one that is recognized and even described quite 
> well (to a first approximation) by the model theory that you dismiss.

I do not dismiss model theory, I just pointed out earlier that your 
questioning of the use of "identifies" rather than "denotes" was asking 
me to use MT terms rather than other english terms.

Now the model theory I have seen only describes the semantics of the 
OWL terms, in explaining how the  statements that Fido is a dog and a 
dog is a subclass of animal constraint the possible interpretations.   
And this is done so as to work on any valid interpretation.  I have not 
seen ( but I may have missed) the bit where when the english in a 
schema describes what the individual ex:fido is, that interpretations 
are further constrained to those in which "ex:fido" actually denotes 
the actual dog we all know and love as Fido.

>> Like with all technical specs, the fact of imperfect adherence in 
>> some cases does not detract from the importance of having made the 
>> perfect idealistic design which has provable properties. One deals 
>> with deviations from the perfect in a form of perturbation theory.
>
>
> We seem to be at cross purposes. Im not saying that the 'unique 
> identification' condition is an unattainable ideal: Im saying that it 
> doesn't make sense, that it isn't true, and that it could not possibly 
> be true. Im saying that it is *crazy*.

Well, you have used "silly" and "crazy", but in the context of your 
statement they clearly denote the characteristics of being well thought 
out and essential to the architecture of the semantic web, respectively.

>
>  Existing W3C standards already provide counterexamples: what single 
> thing is identified by the URI reference  
> http://www.w3.org/2000/01/rdf-schema#Class? This is supposed to 
> *denote* the class of all RDFS classes; but that is not a single 
> well-defined notion, by the very nature of formal semantics: it varies 
> from interpretation to interpretation.

An interpretation is  a mapping from names to things.
What I am saying is that, if there are two interpretations, and the 
things denoted by that URI in those two implementations are 
demonstrably different, then it is reasonable to go back and ask the 
owner of the URI which one is denoted.  The authority may decline to 
reply of course, but if it thinks and thinks and comes back with an 
answer, then that answer is added to the common information which we 
share, and one of the interpretations has to be dropped.

> And there is the problem that MT systems consider all possible 
> interpretations of the data, in any possible worlds.
>
> That is not a PROBLEM; it is how semantics works. When you communicate 
> something to me, you send me some language (or more generally some 
> representations). I have to try to interpret this language and make of 
> it what I can.  But you cannot POSSIBLY send me a single 
> interpretation: interpretations are not the kind of thing that can get 
> communicated. Only language gets communicated.

Indeed, to communicate what something denotes one would need magic.
Like telling a robot - you want to know what "hot" is? this is hot. And 
stimulating its temperature sensor.
Communication doesn't allow any terms which everyone understands. 
Everything communicated is only a message, and the receiver can only 
sense the message and never know what it means. Nothing has
fundamental meaning, a message will just have certain effects on 
certain agent, and agents will change their internal stored state as a 
result of them.

>  So yes, OF COURSE there are many possible interpretations of what you 
> say, even when I have used all my resources of interpretation. This 
> isn't a problem of the theory, it is a FACT ABOUT COMMUNICATION which 
> the theory recognizes and tries - in admittedly a crude way, but we 
> have to start somewhere - to deal with and come to terms with.

This is theory of communication does indeed address how communication.  
However, different theories are used at different scales, and different 
stages in the analysis.

[When we analyze how an electron behaves, we use quantum mechanics.  We 
discover that the position and momentum of the electron cannot be known 
at the same time.   This is just a fact about matter, which the theory 
recognizes and tried - in admittedly crude way, but we have to start 
somewhere - to deal with and come to terms with.  ....  We realize that 
application of that theory in great detail will allow us to make a wave 
equation for an apple, and we figure out that (though it is too 
complicated a job to do in practice) in any reasonable approximation, 
when considering 10^^23 particles, the result is that an apple has, to 
all intents and purposes,  a given position and momentum at any time.  
It isn't that quantum mechanics doesn't hold for apples. It just isn't 
worth doing it when using real apples.  As we take a bite, the theorist 
jumps up and
down warning that it could jump sideways at any moment. The engineer 
takes the bite.]

So let it be for the semantic web.  Many agents have communicated at 
great length over what the URI daml:TransitiveProperty denotes.  During 
this process, the people involved considered many interpretations.  Not 
a ridiculous number, as few in the working group considered 
interpretations in which daml:TransitiveProperty denoted the dog we all 
love and know as Fido.  But the process which you describe in capitals 
above took place.  Drafts were written.  Textbooks written ages a go 
and read by many were quoted.  By the end of the process, after 
axiomatic semantics had been written up and reviewed, and a model 
theory had been written (in english), people went away and wrote 
programs which treated daml:TransitiveProperty in a particular way.  
People found that when one program generated a statement about 
something being a transitive property, the other program did good 
things.   Now, no on can say that the people wring those two programs 
had the same interpretation of the spec, and really in theory shared a 
common thing as that denoted by the URI.  But for that and several 
other URIs, the proof was in the eating. The programs worked. And will 
work, for lots of other people in the future.

It is as though that bit of magic has happened.  When you and I write 
an ontology for marsupials, we don't worry about differences in what we 
mean by "subclass".  We only worry about what we mean by "duck-billed 
platypus".  When we have finished our ontology of marsupials, and a 
thousand experts have poured over it and written commentaries on it, 
then millions of school kids will happily refer to the class of 
marsupials using our URI.   The arguments will have been done.  From 
the standpoint of the school kid, the class is  a well-defined concept, 
where linguistic processes have long since tended to an asymptote, and 
any misunderstandings can be dealt with


>  If you take the  case of an identifier for  pat hayes 
> <phayes@ihmc.us>, for example, the non-logician would consider that it 
> identified one person and get on with their lives
>
>
> The logician can say that also: it is the assertion that a single 
> person exists who has that name. But (1) that is not the same as 
> saying that the name - all by itself - "identifies" a single person 
> (or, well, maybe it is: but if so, then other things said about URIs 
> and resources are wrong) and (2) in fact, they don't assume that and 
> get on with their lives. Sometimes they assume it indicates a person, 
> sometimes a mailbox, sometimes a computer: it depends on the context.

With strings, yes, not with URIs.

There two reasons you are being confused.

1) Sloppiness.  Human beings refer to things through the values of 
properies all the time  ("ask fancy pants what he asked 411 for") , and 
figure  out what people mean, in english but not in math.

2) Confusion with times when the design is to specifically and 
unambigously use a name of one thing to indirectly point to something 
else.   You give an email address of a person who is going to attend a 
conference.   The email mail box isn't going to attend the conference, 
and everyone knows that there is unambiguous traversal of a 
"contact:mailbox" arc involved.   People are confused because a 
namespace (whatever that is)  is indicated by giving the URI of a 
(maybe notional) namespace document which corresponds to that 
namespace. As the namespace is kinda abstract, and only the namespace 
document can be measured, this doesn't really matter.


> Which is fine, let me quickly add, provided some 
> bull-in-the-china-shop authority doesnt keep insisting that all URIs 
> must by fiat always identify a single resource. Then we get 
> interminable arguments and discussions about what 'the' resource is in 
> this very case, and the people who are insisting on this doctrine so 
> firmly tend to be the ones who get exasperated earliest and tell us 
> that it doesnt really matter what the "resource" actually *is*; 
> apparently missing the irony of the fact that the only reason we are 
> having this argument is because of this insane ruling that they are so 
> insistent upon not budging from. Grrrr.

Grr indeed!  To what extent must we settle what the resource is?
That is probably the question which divides our positions.
I would say that when we have more than one candidate and these 
candidates are incompatible, ambiguity would lead to inconsistency, 
then we must settle it.

Example1.

A dog bounds into the room. Tim says, "Here, Fido!" to the dog, and 
says "Pat, meet my dog, Fido" to Pat. Tim plays with th edog. Tim asks 
Pat, "Pat, would please take Fido for a walk?"
Pat takes the dog for a walk.  The name seems to have been 
unambiguously associated with te same dog in both there minds.

Example 2.

Two dogs bound into the room. Tim says, "Here, Fido!" to the first dog, 
and says "Pat, meet my dog, Fido" to Pat. Tim plays with the second 
dog. Tim asks Pat, "Pat, would please take Fido for a walk?"
Pat has to ask which dog is Fido.  The name was not unambiguously 
associated with the same dog in both there minds.  Pat hat to fix that 
before he could continue the conversation.

Example 3.

A dog and a cat bound into the room. Tim says, "Here, Fido!" to the 
first dog, and says "Pat, meet my dog, Fido" to Pat. Tim plays with the 
first dog. Tim asks Pat, "Pat, would please take Fido for a walk?"
Pat takes the cat for a walk.  "For me, in this context, 'Fido' denotes 
the cat.", he says as he leaves.

Which scenario is insane?  The first two?


[...]
Pat:
>> That is, as we add information about it, that information should not 
>> be inconsistent.
>
>
> Right. MT helps you there by providing a crisp notion of consistency. 
> It also gives you an important insight: if you know enough to uniquely 
> identify the referent of a name, then *any* further information is 
> either redundant or inconsistent. Basically, this follows from the 
> observation that the only proper subset of a singleton set is the 
> empty set.
>
> You can think of it denoting different things in different systems, 
> but how are those things "different" apart from the fact they are in 
> different systems?
>
> Well, how are they the same? That is, what gives us a licence to claim 
> that rdfs:Class and owl:Class, for example, are the same class? (In 
> fact, there is a good reason in this case to say they are not the 
> same.)

The are not the same because owl:Class is a member of rdfs:Class but 
not of owl:Class, n'est-ce pas?

For that reason I would say that it would be broken to use the same URI 
for the two classes.

> Maybe you have a more direct acquaintance with abstractions like the 
> class of all classes than I do, but I sure wouldn't know how to decide 
> things like this in general, and I *know* that no computable decision 
> procedure could decide it for me.

But no one asked for a computable decision procedure.
The corners of math can throw up lots of tricky things which make 
people nervous, but fortunately the bulk of semantic web traffic will 
be in terms of things like date, totalamountinusdollars, financial 
instution identifier, etc.

>>  We say every owl:class is an rdfs:Class. That allows us to deduce 
>> things about some classes. Suppose we make other assertions about 
>> rdfs:classes, is it allowable for us to be able to make a 
>> contradiction? I would say not.  Currently, different logical systems 
>> can deduce different things, but the important point is that they are 
>> talking about the same thing when they use the same URI.
>
>
> You need to be careful what you mean by 'same thing'. Sure, if 
> reasoner A uses 'rdfs:Class' and reasoner B uses the same URI, then 
> they ought to both be using the name in the same way, so that they can 
> communicate.     . . . . . . [12]

Yes indeed.  I guess that is what I wanted you to say all along. For 
all B.

>  Nobody is disagreeing with that.  But that is not the same as saying 
> that there must be a single thing that this URI is naming.  Analogy: 
> if we hold hands then we are walking the same way.  But that does not 
> mean there is only one way we can possibly walk. I think you mean the 
> former, but you are saying the latter.

So you accept that everyone must treat the identifier in the same way, 
that perhaps we could say that for two people it must identify the same 
thing, but not that there is one thing which is identifies?

Can we not show that the two conditions are the same?  Suppose there 
was not a single thing denoted by the URI.  Then there must be two 
distinct things denoted by the URI.  Those things to be distinct  must 
be such that is an A uses one and B another as the referent of the URI 
in a message between them, A and B behave inappropriately and so the 
system is broken.

We have honed this distinction down to a faction of a hairs width now.
> [...]

> Perhaps 'identify' doesn't mean 'denote' or 'refer to'. What does it 
> mean, then? Note that if we were to say that 'identify' means MORE 
> than simply 'denote' or 'refer to' - if, say, it also has a 
> connotation that the URI can be somehow used to retrieve some 
> information about the referent - then the claim would become even more 
> false.
>
>
> When one retrieves a document, one gets information which its 
> publisher says, and one can believe or not.  But using a term does 
> (modulo social things such as fraud and engineering things such as 
> broken cables) commit you to the term owner's definition of it, and 
> the document they publish at its URI is taken by design to be 
> information deemed shared by those using the term.  That's the 
> contract.
>
>
> Im happy with that contract, though with a slight hair-tingle at the 
> use of the word 'definition'. But nothing in there says that URIs must 
> uniquely identify resources: in fact, you didn't even use the words 
> "resource" or "identify" , which I am very happy to see were also 
> missing from Tim Bray's down-to-the-wire summary of the essential core 
> of things.
>



> [....]

>>> First, OWL is more than an RDF vocabulary: it is an RDF vocabulary 
>>> with a particular semantics applied to it.
>>
>> Like every RDF vocabulary.  What is interesting about OWL is that for 
>> some of the vocabulary the properties of the Properties can be 
>> defined in math.  But basically OWL isn't any different from the 
>> calendar event vocabulary.  The only reason that an RDF calendar 
>> event has meaning is the semantics of that vocabulary.
>
> As you know, I disagree profoundly with you on this issue. The 
> semantics of an calendar event described in RDF is given by the RDF 
> vocabulary. It is axiomatized in RDF. You can write as much as you 
> like about it and what you think it ought to mean: all that is merely 
> commentary and does not change the meaning *of the RDF* one iota.  
> That follows from the RDF specs themselves.

I don't follow. Imagining schemas and specs  where appropriate, what 
does

[]  rdf:type  cal:Event;
     cal:dtstart "2003-07-31T12:00:00Z";
     cal:end "2003-07-31T13:00:00Z";
    cal:participant   [ contact:mailbox <phayes@ihmc.us> ].

and why?


>> (For you, this may seem perturbing or to say that the logic itself, 
>> the thing you tend to define first, is actually only defined in the 
>> data language. But it works.
>
> No, it doesn't, which is why I insist on my point.

Does too.
__________________________________________

At this point I am horrified to find myself only a fraction of the way 
through the email conversation, so I will hit send and keep the rest 
for another day

Tim.
Attachments

text/enriched attachment: stored
Received on Thursday, 17 July 2003 23:34:55 UTC