more on URI equivalence from Larry Masinter on 2002-10-20 (www-tag@w3.org from October 2002)

From: Larry Masinter <LMM@acm.org>
Date: Sun, 20 Oct 2002 10:55:38 -0700
To: <www-tag@w3.org>
Message-ID: <000e01c27861$e66a9260$6ace8642@masinter>
# I hope TAG is paying attention.  Larry explains clearly why it is folly
# to say "HTTP URLs work fine as identifiers".  The fact is, *all* URIs
# are broke as identifiers, and HTTP bears a good part of the
# responsibility. 

I suppose I should be happy that Joshua says he agrees with me,
but I'm not sure I agree with his short characterization of
my position.  I don't think it is folly to say that
'HTTP URLs work fine as identifiers', since I think they work fine
as identifiers for resources you access by using the HTTP
protocol; there are other things that they don't work as well
for (shining shoes, desert topping, floor wax).

I don't think I agree with the statements listed as "The fact is,"
I don't think '*all* URIs are 'broke as identifiers', nor that
HTTP bears any particular responsibility.

If I'd lay any responsibility, it would be on the attempts
in the various URI specifications to not limit the scope
of applicability of URIs without having any guidance to
give on how the extension to semantic identifiers could
be accomplished.

# Already the W3C has a massive schism in the way URIs
# (*all* URIs, not just http) are used as identifiers in
# two of the most widely-deployed technologies (http and XML).

I think the 'schism' alluded to isn't particularly between
http and XML, since the use of URI as networked object identifier
applies as well to any URI used in HTML (whether http:, ftp:,
gopher:, file:, or cid:), and the use of URI as
string-equality-identifier applies to any URI used in
an 'xmlns' term, whether the URI scheme is http:, urn:,
or dav:.

Is it a "schism", or is it just two different applications
that we have to keep straight?

# I really wish someone would pay attention to this and DO SOMETHING ABOUT
# IT rather than waste time with the "HTTP URLs can identify anything"
# politicians.

Consensus-making is a political process, and W3C is a
consensus organization.

With a little help, HTTP URLs can identify anything. However,
they need a little help; some bit of context that makes it
clear that the URL is being used as a string-equality-identifier
rather than a network-object-identifier.  "xmlns=" is one
kind of context.  'urn:tdb:2002:' (http://larry.masinter.net/duri.html)
is another kind of context.

In natural language, the context isn't explicit
(http://lists.w3.org/Archives/Public/www-tag/2002Aug/0355.html)

How about 'with a little help, HTTP URLs can identify anything, but
by themselves, they only identify network resources'. Other pieces
of context (like 'tdb' or 'xmlns=' or topic maps) provide the additional
help that lets you do (in a system) what is natural to do in
natural language, to transfer meaning so that the "indicator"
is used as a shorthand to mean the thing indicated. In Semantic
Web, though, the context needs to be explicit, so that we can
make assertions about the W3C, the W3C's web server, and the
home page at the W3C's web server, even though all of those things
may be indicated (by context) when someone writes 'http://www.w3.org'
in free text. 

# A) The standards must offer coherent, consistent, unified, and
# completely unambiguous guidance regarding how to compare any two URIs
# for equivalence.  

We have ample evidence that different applications need
different equivalence relationships. 'coherent,
consistent, unified and completely unambiguous guidance' 
would have to include the possibility that different applications
might use different algorithms. And it's also important to
note that there are at least two syntactic spaces (URIs, IRIs).

I suppose this is an issue for the revision of RFC 2396, to
expand on how to compare URIs.

#  a) No reliance on "convention" for things like UTF-8 encoding,
# etc.  There are way too many different and often contradictory
# conventions available for edge cases like internationalization.

I don't think we should ever rely on "convention" without "specification".
The W3C-I18N group schedule for completion of the IRI draft is
aggressive, and I'm sure they can use some help.
http://www.w3.org/International/Group/iri-edit/draft-duerst-iri.txt
comments to www-i18n-comments@w3.org.

#  b) Absolutely no allowance for specs to compare differently than
# other specs.  If HTTP identifies two things to be the same, XMLNS had
# better as well.

I don't think this is a good idea. The problem is that "xmlns"
needs a stable algorithm that can be implemented and be guaranteed
to apply for all time, while different URI schemes may well
define other equivalence relationships that apply.  Of course,
for the purpose of the HTTP protocol, http://www.w3.org and
http://WWW.W3.ORG and http://wWw.W3.oRg:80 are equivalent. But
there's no way to code all of the current and potential future
equivalence relationships into an implementation of the
xmlns algorithm. 

I think it might be a good idea to try to arrange that
there can never be two namespaces, one named http://www.w3.org
and another named http://wWw.W3.oRg:80/ , so that the
difference in equivalence relationships is moot, but
I don't quite know how to arrange that.

#  c) No "guessing" -- if an implementer does not have unambiguous
# and exact rules for comparing any two particular URIs, for God's sake
# don't allow the implementer to report an equivalence one way or the
# other.  "Don't know" is the only legitimate answer.

Each application needs to define its equivalence relationship, but
sometimes there really is legitimate use of variance. For example,
it's up to the web server at 'example.org' to decide whether
  http://example.org/CASEEXAMPLE and http://example.org/caseExample
are equivalent. A client that actually knows (by some side channel)
the case insensitivity of the server it's talking to might legitimately
infer that the two are equivalent, while another client might think
that they're not equivalent. But it's not required (in most of the
web) to actually know about equivalence.

#  d) Relative URIs -- this goes with "c".  Everywhere that
# implementations or specs permit comparison of relative URIs without
# explicitly knowing the base and having explicit and universal rules
# regarding universalizing the relative URI, the spec should prohibit
# reporting any comparison.

Are there any specs that allow comparison of relative URIs without
explicitly knowing the base? I think that indeed would be a bad idea.

On the other hand, I do think that RFC 2396 is clear that there
is only one, explicit and universal algorithm for combining a
base and a relative URI to create an absolute URI.

# B) The standards must offer coherent, consistent, unified, and
# completely unambiguous guidance regarding how to compare any two qnames
# for equivalence.

The limitations of 'unambiguous guidance' for URI equivalence
might not apply to qnames, because qnames might not allow
all URIs and might supply enough context.

# Personally, I think this is the most important issue facing WWW and is
# eminently in the realm of what TAG should be addressing.  In fact, I
# would say that TAG could spend a year addressing just these two issues
# and be considered wildly successful.

The TAG charter is at http://www.w3.org/2001/07/19-tag.
The primary purpose of the TAG is to document cross-technology
principles in architectural recommendations, and resolving
issues of architectural impact is secondary. So spending
a whole year on two issues wouldn't seem to be a "wild success",
even if they were actually the most pressing ones facing
web architecture.

Perhaps the TAG might be as successful shine on some of these
issues, as some of them are unsolvable.  Cope with ambiguity, and
get on with it.

Separately, I want to argue that http://www.w3.org/TR/webarch/
could be split into separate documents that can progress
and be updated independently. Defining equivalence and settling
identification vs. indication seems necessary to finishing
section 2 ("Identifiers and resources"), but the other sections
could also make progress.

Larry
-- 
http://larry.masinter.net
Received on Sunday, 20 October 2002 13:55:09 UTC