Re: URx Questions from Patrick Stickler on 2002-01-23 (www-archive@w3.org from January 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Wed, 23 Jan 2002 10:26:47 +0200
To: ext Mark Baker <distobj@acm.org>
Message-ID: <B8744067.C19D%patrick.stickler@nokia.com>
Should this discussion be moved back on-list? It's very
relevant, and I'm sure others would be interested.



On 2002-01-23 0:44, "ext Mark Baker" <distobj@acm.org> wrote:

>> Though I probably will need to provide a more lengthy explanation...
> 
> Oh my, I was hoping for a couple of sound bites.

Up front, I want to give warning and apology if any of
my responses seem too curt, blunt, or generally insulting.

In the interest of clarity and avoiding rambling on so
as to say things in a politically correct or diplomatic
fashion, I've chosen to just say what I think -- and all
is said with all due respect and curtesy, honestly.

> 
> But I see that you're using the classical view of URI space.

I have stated repeatedly, and explicitly, that I subscribe
to the classical view.

> The W3C
> has been fighting for a long time to change this view, and got at
> least part of the way there with the "uri-clarification" note.

The "clarification" does not clarify much of anything other
than what the classicial and contemporary views are. It does
not IMO mandate the contemporary view. It also lacks any clear
discussion of what either view really means for software
engineers building URI aware applications -- apart from a single
mention of the word 'formal', the significance of which must
be guessed at.

Furthermore, I find that far more people that I encounter,
particularly those working on semantic web applications,
subscribe to the classical view, particularly software
engineers building web applications that use URIs extensively,
and who want/need a logical and formal taxonomy of URI classes.
 
> I see it this way (which is also the way that Tim Berners-Lee,
> Dan Connolly, and Roy Fielding see it - the three people most
> responsible for the Web as we know it);

With all due respect to all three individuals, and many others,
I think that they are missing something fundamentally important.
(Even the gods can be wrong now and then)

My impression (which may be incorrect) is that they and
others in their "camp" subscribe very strongly to a philosophy
that is epitomized by Perl where "things are what you
use them to be" which I consider to be hacking not engineering and
completely unsuitable for many application areas (e.g. eCommerce,
authority of knowledge, digital rights, etc.) even given
the chaotic and dynamic nature of the Web. Don't get me wrong,
I like Perl and use it alot, and there are many cases where
hacking is appropriate -- but also many cases where it is not.

I see the "everything is a URI and its meaning is how I use it"
as just an extension of the Perl scalar datatype view, which
again I find to be poor engineering. It's a useful notion
for hacking and for one-off applications, but not as a
principle of software architecture, particularly where
data integrity is important and ambiguity is to be minimized
(e.g. the Semantic Web).

It appears that those who subscribe to the "contemporary" view
also tend to hold this "things are what I use them for" view.

And, BTW, I find the term "contemporary view" to be a *highly*
politically loaded and offensive term equating to "if you don't
agree with the modern, contemporary view, you're behind the times
and your views are passe.

I see the "contemporary view" as a transient fad that
will pass, leaving the classical view to continue on
its merry way towards a global semantic web.

> Names are strings that identify something.  "Mark Baker" is my
> name.  URIs are a subset of all possible names, with a specific
> structure, e.g. "foo://bar-com/baz".  Every URI is a name because
> it identifies something, and I can associate meaning with it
> independant of any further interpretation of that URI.

A formal taxonomy of URI Classes with consistent semantics is
an issue of scalability and economy. Sure, you can enumerate all
knowledge about every individual URI, but if a large portion of
the knowledge about many URIs intersects in a functionally
significant way, it is good engineering to capture that intersection,
and that's what URI schemes are for, and also what URI Classes are
for.

Otherwise, let's just use UUIDs which are globally and temporally
unique, and then add whatever semantics we want about them; which
seems to me to represent the distillation of the contemporary
view. After all, now with the DDDS architecture, we can just
create the DNS entries to map any arbitrary string to an IP
address -- so why bother even with 'http:' or other URI schemes?

Just create your names using UUIDs, describe them as you like,
and for those that denote web resources, use DDDS to map them to
some address. Eh?

After all, a name is just an opaque identifier that "is" what
you say it is.

> For example,
> I can tell you that "http://www.markbaker.ca/James/" identifies my
> son.  Nobody need ever invoke a GET on that URI in order to associate
> that name with the meaning I gave it.

And how then does a software application know that it denotes
a non-digital resource and thus, a retrieval error is in fact
"correct" and to be expected rather than an indication of some
access problem?!

I never argued that URLs couldn't be (mis)used to denote non-digital
resources, only that a formal taxonomy of URI classes based on
resolution criteria (direct, indirect, none) is extremely useful
for applications -- particularly SW applications which are using
URIs to infer things about the universe.

Furthermore, are the proponents of the contemporary view completely
*blind* to the confusion that exists in the larger masses of web
users regarding 'http:' and other URLs that don't resolve because
they don't denote digital resources (e.g. XML Namespaces, vocabulary
terms, etc. etc.) as well as the confusion between vocabulary URLs
and schema URLs and the total incompatability with such an approach
for multiple schemas using the same vocabulary?! It appears so.
 
> I can do this for more than just markbaker.ca URIs.  You and I can
> have a conversation about http://www.ibm.com without invoking GET.

No. You can only have a conversation about the web resource accessible
at http://www.ibm.com, which is neither the URI 'http://www.ibm.com'
nor the company 'IBM Inc.'. You can achieve this fundamental
distinction by using URI schemes that embody the key semantics:

   http://www.ibm.com      = a web resource
   uri:http://www.ibm.com  = the URI for a web resource
   auth://ibm.com          = a (semantic) web authority/entity

Now, and only now, can we actually discuss these three things
in a clear and consistent manner.

E.g. (apologies to IBM and the IETF for the use of their
      trademarks in the following examples, as well as to
      all persons actually named John Doe ;-)


<rdf:Description rdf:about="http://www.ibm.com">
   <dc:title>Welcome to IBM</dc:title>
   <dc:creator rdf:resource="auth://john.doe@ibm.com"/>
   <dc:publisher rdf:resource="auth://ibm.com"/>
</rdf:Description>

<rdf:Description rdf:about="auth://ibm.com">
   <dc:title>International Business Machines Inc.</dc:title>
</rdf:Description>

<rdf:Description rdf:about="auth://john.doe@ibm.com">
   <person:name>John Doe</person:name>
   <person:email rdf:resource="mailto:john.doe@ibm.com"/>
</rdf:Description>

<rdf:Description rdf:about="uri:http://www.ibm.com">
   <rdf:type rdf:resource="http:"/>
</rdf:Description>

<rdf:Description rdf:about="http:">
   <rdfs:subClassOf rdf:resource="voc://ietf.org/URI-Taxonomy/URI/URL"/>
</rdf:Description>

<rdf:Description rdf:about="uri:auth://ibm.com">
   <rdf:type rdf:resource="auth:"/>
</rdf:Description>

<rdf:Description rdf:about="auth:">
   <rdfs:subClassOf rdf:resource="voc://ietf.org/URI-TaxonomyURI/URP/URV"/>
</rdf:Description>


Otherwise, your knowledge would be highly ambiguous and
for all practical purposes useless. E.g.


<rdf:Description rdf:about="http://www.ibm.com">
   <dc:title>Welcome to IBM</dc:title>
   <dc:creator rdf:resource="mailto:john.doe@ibm.com"/>
   <dc:publisher rdf:resource="http://www.ibm.com"/>
   <!-- Is the publisher its own publisher or just the
        publisher of the web page, and is John Doe
        the creator of the web page or of IBM or both? -->
</rdf:Description>

<rdf:Description rdf:about="http://www.ibm.com">
   <dc:title>International Business Machines Inc.</dc:title>
   <!-- Is this the title of the web page or IBM, or both?
        Does the web page and/or IBM have two titles? -->
</rdf:Description>

<rdf:Description rdf:about="mailto:john.doe@ibm.com">
   <person:name>John Doe</person:name>
   <person:email rdf:resource="mailto:john.doe@ibm.com"/>
   <!-- Is John Doe the name of the email address or a person, and
        does the email address have an email address that is itself? -->
</rdf:Description>

<rdf:Description rdf:about="http://www.ibm.com">
   <rdf:type rdf:resource="http:"/>
   <!-- Is this the type of the web page or of the URI of
        the web page, or of IBM? -->
</rdf:Description>

<rdf:Description rdf:about="http:">
   <rdfs:subClassOf rdf:resource="voc://ietf.org/URI-Taxonomy/URI/URL"/>
   <!-- fortunately, this is unambiguous, at least in this example -->
</rdf:Description>

<rdf:Description rdf:about="auth://ibm.com">
   <rdf:type rdf:resource="auth:"/>
   <!-- Again, is this the type of the URI or of IBM? -->
</rdf:Description>

<rdf:Description rdf:about="auth:">
   <rdfs:subClassOf rdf:resource="voc://ietf.org/URI-TaxonomyURI/URP/URV"/>
   <!-- also unambiguous, at least in this example -->
</rdf:Description>


I assume the numerous ambiguities and circularities in
these second examples are clear, and also clearly demonstrate
the critical need for distinctive URIs.

Now, playing devil's advocate to my own arguments, I will
concede that one could have different 'http:' URLs to
capture the distinctions provided by my separate URI schemes,
but there still remains the problem that in such a scenario
URLs would be used for non-digital, non-accesible resources,
and thus, the fair and resonable expectation by both a human
and an application that a URL provides access to a web
resource is violated.

Again, how is an application (or person) supposed to know
that a failure to resolve is intended/expected rather than
due to some actual problem accessing a web resource?
 
> Is "foo://www-markbaker-ca/James/" an URL or an URN?  You don't know,

You *could* know, if you said something like

<rdf:Description rdf:about="foo:">
   <rdfs:subClassOf rdf:resource="voc://ietf.org/URI-Taxonomy/URI/URN"/>
</rdf:Description>

Now, every SW agent can *know* that every instance of the 'foo:' URI
scheme is in fact a URN, and by the defined qualities of a URN, it
denotes a web resource which is accessible indirectly by that URI,
and can then look for a definition within its operational context
for how such URIs are to be resolved (which protocol or agency).

Of course, since your son actually *isn't* a web resource, that
resolution will fail (unless we move to the future or an alternate
parallel dimension where you can beam folks on demand from wherever
they are ;-)

> but if I say *that* identifies my son, then that's the important
> thing.

It's one of the important things, but not every important thing.

It is essential to keep in mind that the Semantic Web is *not*
for humans! It is for stupid machines that can't think, and need
explicit, well defined, formal symbol systems to do tricks with
bits.

*You* may understand that URI to denote your son. And some other
human may be able to discern from its mnemonic characteristics
that it likely denotes a human, but a computer just sees bits.

> Now, if after I've asserted that, I define a mapping that says;
> - replace "foo:" with "http:"
> - do 's/-/./g' on the authority
> 
> Is it an URN (using your definition of URN, not the contemporary
> one) or an URL now?

Obviously, since 'http:' is a URL scheme, you have now created
a URL, but that is a *different* URI from the first.

> Hopefully you see where I'm going with this.

Actually, no, unless you are suggesting that "www-markbaker-ca/James/"
is a globally unique identifier in its own right and that the URI
scheme prefix simply has to do with the method of interpretation,
such that 'http://www-markbaker-ca/James/' and
'foo://www-markbaker-ca/James/' denote the same thing but merely
represent different methods of interaction/access/reference.

I'm going to presume that that is not what you are meaning, as
that is contrary to the very basis of URI uniqueness.

> "Identifiers" are the
> important thing.  An identifier is a name or a locator in context.
> In the context of resolving an identifer, it's always a locator.
> In the context of talking about it, it's always a name.

My above examples show that this view is a fallacy.

We must be able to talk about the identifier, as well as what
is identified, and a given identifier can only identify one thing
in every context, not different things in different contexts.

The presently widespread view that e.g. 'http://www.ibm.com' can denote
both a web page and the company, or that 'mailto:john.doe@ibm.com'
can denote both a person and an email address is just dead wrong,
and unfortunately, it seems that this is a common view held by
those who subscribe to the "contemporary view".

As I've said before, it may very well be that the *Web* can limp
along with the contemporary view, but the *Semantic Web* cannot.
 
>> Does that help clarify my understanding of URL, URN, etc.?
> 
> Yes, thanks.  Very "traditional". 8-)

The "founding fathers" got it right in the first place. The
contemporary view is a false detour. We need to get back
on the main road.

Cheers,

Patrick


--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com
Received on Wednesday, 23 January 2002 03:25:56 UTC