RE: designating datatypes

Patrick, your recent emails on this topic have made me think we have 
a major disconnect, so rather than respond point-by-point, I'm 
addressing the central issue first.

I gather that you feel that RDF treats URirefs as names, in the sense 
that each URIref is the name of some thing, and that different 
interpretations, although perhaps they differ in what facts are true 
about those things, treat the names themselves are 'rigid', ie a 
given name denotes the same thing in every interpretation. (The term 
'rigid' for this idea is common, so I will use it from now on, OK?)

This is quite wrong. There is no basic assumption of rigidity in RDF, 
and indeed there isn't in most assertional languages. It is sometimes 
claimed that natural language names (certainly *proper* names, such 
as "Patrick Hayes" and "Delhi") are, or should be, rigid; but even 
this assumption raises a host of problems when one considers how 
language is actually used.

Consider what it that you learn when you hear something new in 
natural language. I will tell you some facts about a guy I know. 
Imagine you get them one at a time, and ask yourself at what point 
you know who this guy actually is.

1. His name is 'Joe'.
2. He is a professor at University of Illinois.
3. His basement workshop is a joy to behold.
4. He is a neuro guy in the psychology department.
5. He does experiments with cats which involve doing delicate surgery 
on their eyeballs using loops of fine gold wire.
7. His cats love him but he keeps them permanently hungry
8. His surname is unusual and rare in the US phone directory
9. His surname is 'Malpeli'.

Probably, 1+2+9 would be enough for you to locate him using Google. 
With a bit more work you might be able to do it just using 1+2+4+8, 
or even just the first three, depending on how good the department's 
webpage is. But if you just had, say, 1 through  7, the chances are 
that although you would know quite a lot about his guy, what you know 
could be accurately paraphrased as an existential: you know some guy 
*exists* whose name is "Joe", etc.. But you aren't able to access the 
actual guy himself. What this means is that *there are some 
interpretations of what you know* in which the description *doesn't* 
pick out my friend Joe, but in which the name "Joe" denotes someone 
(or even something) else. That is the model-theory way of saying that 
something isn't known, or that some information isn't available: 
there exist interpretations in which it is false. In other words: it 
could be false, for all you know. Its being false is consistent with 
what you know.

Consider what it would mean to say that some name, say "Joe Smith" 
referred to the same thing in all interpretations. It would mean that 
it would be *logically impossible* for this name to be misunderstood: 
anyone (or any RDF reasoning engine) should be able to figure out, by 
pure logical thought from first principles, who it was that "Joe 
Malpeli" referred to, just from the name alone. All names would be 
tautologies. This is like the old magical idea of things having a 
'true name' which, when known, gives the knower magical powers over 
the denotee. This is a real problem in Earthsea, but here in reality 
things don't work like that.

Now, there is a case to be made out for 'proper names', which of 
course are primarily used to identify people and things. When you 
read my first sentence, you probably figured out immediately that 
"Joe" wasn't some arbitrary string, but was part of a proper name; 
and once you had the whole name, you know how to go about finding the 
referent of it, if you really need to find him. Unless you actually 
go to Illinois, or give Joe a phone call, you will still only really 
find out more *about* him, in fact, even if you have his website and 
his SS number and so on. But proper names are special because they 
are 'grounded', ie sufficiently uniquely attached to their intended 
referents (by social conventions, etc.) that they provide a quick and 
reliable way to identify referents. SS numbers are even better (in 
the USA) and phone numbers and email addresses and IPAs and all the 
other machinery provide many other examples of 'grounded' symbols 
which we rely on. But notice that this grounding relies on something 
external to the model theory: it has to, in fact. Even if you knew 
that "Joe" was a proper name, that wouldn't in itself allow you to 
draw any more valid inferences about Joe.

The reason why we have to do something special for datatypes is that 
we are *assuming* that the datatypes are provided by some external 
source, but also assuming that *we* will use a URIref to refer to the 
particular datatype. Now, who gets to decide which datatype a given 
URIref refers to? Obviously, the external source, right? So in this 
case we are using the uriref as a proper name, in effect; but URIrefs 
aren't proper names; so all I want to do is to say explicitly that in 
this case, in datatyped interpretations, we *will* treat these 
URIrefs as rigid in this sense: we will insist that the provided name 
of the datatype will continue to be a name of that datatype when its 
used in RDF. We have been making this assumption all along, but 
implicitly; all I want to do is make it formal and precise.

.....


>
>I'm presuming that the MT would say
>
>   I(uriref(x)) = x

Well, yes, it could; but for a normal interpretation there wouldn't 
be any point in saying this since it follows from the definition of 
'x' in those cases. That is, we start with the urirefs themselves and 
we interpret them using I, so the 'x' in your equation *is* 
I(<someuriref>), and then uriref(I(<someuriref>)) has to be 
<someuriref>, so the condition is vacuous. But datatypes are 
different precisely because they are defined outside RDF, by some 
external authority and that external authority gets to give them a 
name. (If they didn't, how could either of us refer to the datatype?) 
So how do we say, as part of our spec, that interpretations must 
honor the naming done by the external authority? Now this equation 
ISN'T vacuous:

I(name(x)) = x

It says that OUR interpretation (I) will agree with YOUR name-choice 
for the datatype (name(x)); we - RDF - agree to not re-interpret your 
name for your datatype.

>
>How is a URIref different from a name? Aren't the names of RDF URIrefs?

Yes, they are.

....
>
>>  >Sorry, but I consider the proposed change to violate the very
>>  >basis of using URIs to *denote* resources
>>
>>  No, really, it doesn't. It just says that whoever invents the
>>  datatype also gets to say that some URI denotes (refers to) it, and
>>  we agree to go along with that.
>
>But that is *already* provided for! That's the way RDF works. Someone
>says "the URI x denotes the resource y" and if that someone is the
>"owner" of both the URI and the resource, then that URI is an
>'official' URI for that resource.

Er....no, that isn't how it works. At least, maybe it does in 
practice, but that isn't sanctioned by anything in the semantics 
(except good old 'social meaning', of course). Which is exactly why 
we need to actually say something extra to make it work this way in 
the case where we want the semantics to 'formally' require this.

BTW, this is exactly the issue Ive been talking about as this larger 
issue that needs to be looked at in a broader context than just RDF. 
Right now there are no established general-purpose rules for giving 
things names on the Web. None. There are retrieval protocols (HTTP) 
and there are URNs, but there is no general account of how a URI 
reference actually gets to have a referent.

>We don't need to posit that URI
>as being *part of* the resource to infer that it is an 'official' URI.
>
>There is no need for this change, and I consider the change to be
>a mistake regardless. That's two reasons not to make it.

Well, there is a need to refer to the name of a datatype. I'll just 
assert that as editor. And I really don't think that this is a 
change. It will not affect anything that any other document says 
about datatypes, not change any test case or entailment.

....
>
>Naming a resource is completely orthogonal to its nature.
>
>There is not, nor ever should be, any requirement for the naming
>of a resource by a URI to affect the nature of that resource.

It is simply a fact that my name is "Patrick John Hayes". Knowing 
this about me is often very handy; and if you want to refer to me in 
text, its vital. Whether or not my 'nature' requires me to 'include' 
my name as 'part' of me is irrelevant to the fact that the best way 
to *refer to* me is often to use my name; and, more to the point 
here, to use my name to refer to something else is likely to cause 
trouble when communicating.

>This seems such a fundamental principle that I'm boggled we are
>even discussing this proposed change as reasonable!
>
>>  >How do you know how "en"^^xsd:lang denotes the English language?!
>>
>>  Because, presumably, some standards body has published a list of
>>  language tags and defined "en" to mean English, and we agree to go
>>  along with that.
>
>EXACTLY!
>
>The mechanisms for associating a given URIref with a particular
>resource are *bootstrapping* mechanisms outside the scope of RDF,
>or any layer based on RDF (including OWL).

Right, but all Im wanting us to do is to say that we will conform to 
them. RDF does not get to say what name(d) actually is for some 
datatype d; but its not unreasonable to require, when someone tells 
us about some new datatype, that they tell us what name to use to 
refer to it with. If they don't, we will have to invent our own 
private name; and then how can they tell when we are referring to it?

>We might be able to infer some of those mechanisms based on e.g.
>examination of individual URIs (violating the opaqueness of those
>URIs) but exactly *how* a given URIref is defined as denoting a
>given resource is outside the scope of RDF.
>
>URIrefs are atomic. They just *are*. And they denote what they denote.

They only denote *in a particular interpretation*. In a different one 
they might denote something else. If we want to lock down the 
denoting, then we need to add a semantic condition to do that, which 
is what the proposed equation does.

>How they denote what they denote is not visible to the RDF layer.
>
>Come one. This is pretty fundamental stuff, no?

Indeed.

....
>
>The name is the URIref, no?
>
>I just don't see why the MT needs some other name but the URIref.

It doesn't: I mean the URIref. We can call it the 'datatype URIref'. 
I actually say its a URIref in any case.

>
>I think your foot has fallen through the floor of the atomic
>foundation of RDF ;-)
>
>URIrefs are the only names you have, and the only names you need.
>
>Whether a given datatype (or any resource) has an 'official' URIref
>to denote it should not affect the MT. If a URIref denotes a given
>datatype, then that is the datatype you have. And the URIref denoting
>it is the only name you need -- whether or not it is an 'official'
>name.
>
>All D-interpretations dealing with the same datatype will have
>the same interpretation. And whether a given datatype has one or
>more names, the D-interpretation should be based on the datatype,
>not the name denoting the datatype.

BUt that doesn't work, which is what Peter noticed. The point being 
that an interpretation (lust like a datatype, in fact) is a mapping 
FROM a lexical space - urirefs - to a semantic space. Sometimes we 
need to refer to this mapping itself. If we can only talk about the 
semantic space, we can't make the urirefs mean what they are supposed 
to mean.

>
>What you seem to be suggesting really worries me.
>
>If I have ten different URIs denoting some person 'Bob', I would
>expect that asking about the ex:father of Bob to have a consistent
>interpretation, regardless of which URI I used to denote Bob in
>the given query.

Right.

>And the MT shouldn't care about whether I used
>an 'official' URI to denote Bob. All it should care about is that
>I denoted Bob

But how does it know that? You denoted Bob in all interpretations in 
which your URI - the one you happen to be using - denotes Bob. The 
only ways to ensure this are either to say so much, using that 
uriref, that Bob is the only logically possible denotee (this is 
usually impossible), or else to rely on some external 'grounding' 
principles which attach fixed meanings to enough of your vocabulary 
(say, to SS numbers or webpage addresses) so that you can use these 
to pin down your intended denotation. All Im saying is that in the 
case of datatypes, we allow the external authority to give us a 
pre-grounded uriref, and we will formally require that this uriref 
will indeed always be interpreted in this way. Then we and them can 
communicate and both be confident that these urirefs, at least, will 
have fixed denotations.

>and Bob has a certain property ex:father. And of
>course, likewise the 'FATHER' property could itself be denoted
>by numerous URIs, and again, all that matters is that I am dealing
>with a person resource Bob and a property resource FATHER and
>interpretations are based on those resources, not the URIrefs
>denoting them.
>
>Right?
>
>>  >You seem to be suggesting that what
>>  >a given URI might denote can differ between different
>>  D-interpretations.
>>
>>  Right, it can unless we say it can't. That's what I want to say.
>
>Well, try to say it without having to embed the URIref as part
>of the Datatype. It seems to me that if this is a problem with
>D-intepretations, then the same problem must exist for all
>interpretations.

It does, in a sense, but the D-interpretations are the only cases 
where we explicitly refer to an external naming authority, so the 
only cases where this issue arises in the spec.

>
>>  >I thought one key quality of RDF was that the denotation of a given
>>  >URI never changes. It always means the same thing.
>>
>>  Not at all: it can vary from interpretation to interpretation.
>
>Eh? What? I hope you're talking about some theoretical mathematical
>possibility and not the reality of RDF.

Nope, reality.

>In RDF, a given URI always denotes the same thing.

No, really, it does not. Absolutely, provably, not. For example, any 
consistent RDF graph can be  satisfied by a Herbrand interpretation 
in which urirefs denote themselves, just like literals.

>You could have
>different Model Theories that had those URIs denoting different
>things, sure, but I'm presuming that the official RDF MT is supposed
>to ensure that in a valid RDF interpretation, a given URI always
>denotes the same thing.

Yes, *in a particular interpretation*. But if I send you an RDF 
graph, I don't send you an interpretation. You just get the syntax: 
you gotta interpret it yourself.

>  > >And if the
>>  >creators of a datatype give it a certain URI, then that's final.
>>  >That URI always denotes that specific datatype, whatever
>>  that datatype
>>  >is. Period. End of story.
>>
>>  No, which is why we need to say this explicitly.
>
>Then it needs to be sated explicitly for all resources, not just
>Datatypes.

Well, but for most resources it would be vacuous, since they don't 
arrive as 'names' of anything in particular.

>
>Surely there is some foundational statement in the MT that expresses
>the constraint that in RDF interpretations, a given URI always denotes
>the same thing.

In each interpretation it does. It does not however between 
interpretations. If you look at a uriref in isolation, RDF has no way 
to know what it is supposed to refer to.

>???
>
>If not, then I would consider that a bug in the MT.

If its a bug, its a bug in the way the world is put together. Pray 
for guidance, but don't blame me.

Pat

-- 
---------------------------------------------------------------------
IHMC					(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola              			(850)202 4440   fax
FL 32501           				(850)291 0667    cell
phayes@ai.uwf.edu	          http://www.coginst.uwf.edu/~phayes
s.pam@ai.uwf.edu   for spam

Received on Saturday, 1 March 2003 17:16:23 UTC