RE: Sticking another fork in the URI issue: equality vs. equivalence

I hope TAG is paying attention.  Larry explains clearly why it is folly
to say "HTTP URLs work fine as identifiers".  The fact is, *all* URIs
are broke as identifiers, and HTTP bears a good part of the
responsibility. 

Already the W3C has a massive schism in the way URIs (*all* URIs, not
just http) are used as identifiers in two of the most widely-deployed
technologies (http and XML).

I really wish someone would pay attention to this and DO SOMETHING ABOUT
IT rather than waste time with the "HTTP URLs can identify anything"
politicians.

This problem is an adoption-blocker for data interchange scenarios which
require reliable identification of data entities.

To be as clear as possible, here are the two architectural issues which
I think must be addressed before reliable identification scenarios can
proceed:

A) The standards must offer coherent, consistent, unified, and
completely unambiguous guidance regarding how to compare any two URIs
for equivalence.  Due diligence here includes:
	a) No reliance on "convention" for things like UTF-8 encoding,
etc.  There are way too many different and often contradictory
conventions available for edge cases like internationalization.
	b) Absolutely no allowance for specs to compare differently than
other specs.  If HTTP identifies two things to be the same, XMLNS had
better as well.
	c) No "guessing" -- if an implementer does not have unambiguous
and exact rules for comparing any two particular URIs, for God's sake
don't allow the implementer to report an equivalence one way or the
other.  "Don't know" is the only legitimate answer.
	d) Relative URIs -- this goes with "c".  Everywhere that
implementations or specs permit comparison of relative URIs without
explicitly knowing the base and having explicit and universal rules
regarding universalizing the relative URI, the spec should prohibit
reporting any comparison.

B) The standards must offer coherent, consistent, unified, and
completely unambiguous guidance regarding how to compare any two qnames
for equivalence.
	a) Depends entirely on proper resolution of above issue.
	b) Again, none of this insanity where different specs do it
differently.  RDF and XMLNS already have different rules here.  That
needs to be fixed.  It is crazy.  I don't care what the resolution is,
but for God's sake it is bad enough we already have different ways to
test equivalence of URIs, let's stop compounding things by fragmenting
qname equivalence comparisons.
	c) Again, specs need to be hardcore about people who report
equivalence in the absence of absolute certainty.

Personally, I think this is the most important issue facing WWW and is
eminently in the realm of what TAG should be addressing.  In fact, I
would say that TAG could spend a year addressing just these two issues
and be considered wildly successful.




> -----Original Message-----
> From: Larry Masinter [mailto:masinter@adobe.com]
> Sent: Saturday, October 12, 2002 3:02 PM
> To: www-tag@w3.org
> Subject: Re: Sticking another fork in the URI issue: equality vs.
> equivalence
> 
> 
> # URIs are strings with fairly strict syntax constraits.  There are
two
> # kinds of operations defined on them:
> 
> # 1. You can test them for equality
> # 2. You can attempt to access representations of the resources they
> identify.
> 
> Statement 1 is too narrowly drawn:
>   1. You can test them for equivalence
> 
> What kind of equivalence you care about depends on
> the context of use. For use as a namespace name,
> the equivalence is 'string equality' (which is,
> after all, the strictest equality you can have
> for something defined as a string of characters).
> 
> For use in context of operations in category '2'
> above, though, the equivalence relationship is
> looser, in that 'http://www.w3.org' and 'http://WWW.W3.ORG'
> are equivalent.
> 
> I think when talking about URIs you're better off
> avoiding talking about 'equality' so that you can
> be clear about which equivalence relationship you
> expect in the context.
> 
> Statement 2 is also too narrowly drawn; it would
> be better to write it as
> 2. You can attempt to interact with the resources they
>   identify.
> 
> That would cover 'mailto' and POST as well as 'http' and GET.
> 
> # Speaking as the implementor of two of the largest web robots ever
> # written and one commercial internet search engine, I am intimately
> # familiar with investing vast numbers of CPU cycles in manipulating
and
> # storing and indexing and retrieving and caching and queuing HTTP
URIs as
> # names, because that's what they are, names.
> 
> Surely cache implementors use a different equivalence
> algorithm than namespace name implementors, e.g.,
> http://wwww.w3.org equivalent to http://www.w3.org:80,
> etc.
> 
> Larry
> --
> http://larry.masinter.net

Received on Saturday, 19 October 2002 17:25:40 UTC