- From: Joshua Allen <joshuaa@microsoft.com>
- Date: Sat, 19 Oct 2002 14:25:08 -0700
- To: <www-tag@w3.org>
- Cc: "Larry Masinter" <masinter@adobe.com>
I hope TAG is paying attention. Larry explains clearly why it is folly to say "HTTP URLs work fine as identifiers". The fact is, *all* URIs are broke as identifiers, and HTTP bears a good part of the responsibility. Already the W3C has a massive schism in the way URIs (*all* URIs, not just http) are used as identifiers in two of the most widely-deployed technologies (http and XML). I really wish someone would pay attention to this and DO SOMETHING ABOUT IT rather than waste time with the "HTTP URLs can identify anything" politicians. This problem is an adoption-blocker for data interchange scenarios which require reliable identification of data entities. To be as clear as possible, here are the two architectural issues which I think must be addressed before reliable identification scenarios can proceed: A) The standards must offer coherent, consistent, unified, and completely unambiguous guidance regarding how to compare any two URIs for equivalence. Due diligence here includes: a) No reliance on "convention" for things like UTF-8 encoding, etc. There are way too many different and often contradictory conventions available for edge cases like internationalization. b) Absolutely no allowance for specs to compare differently than other specs. If HTTP identifies two things to be the same, XMLNS had better as well. c) No "guessing" -- if an implementer does not have unambiguous and exact rules for comparing any two particular URIs, for God's sake don't allow the implementer to report an equivalence one way or the other. "Don't know" is the only legitimate answer. d) Relative URIs -- this goes with "c". Everywhere that implementations or specs permit comparison of relative URIs without explicitly knowing the base and having explicit and universal rules regarding universalizing the relative URI, the spec should prohibit reporting any comparison. B) The standards must offer coherent, consistent, unified, and completely unambiguous guidance regarding how to compare any two qnames for equivalence. a) Depends entirely on proper resolution of above issue. b) Again, none of this insanity where different specs do it differently. RDF and XMLNS already have different rules here. That needs to be fixed. It is crazy. I don't care what the resolution is, but for God's sake it is bad enough we already have different ways to test equivalence of URIs, let's stop compounding things by fragmenting qname equivalence comparisons. c) Again, specs need to be hardcore about people who report equivalence in the absence of absolute certainty. Personally, I think this is the most important issue facing WWW and is eminently in the realm of what TAG should be addressing. In fact, I would say that TAG could spend a year addressing just these two issues and be considered wildly successful. > -----Original Message----- > From: Larry Masinter [mailto:masinter@adobe.com] > Sent: Saturday, October 12, 2002 3:02 PM > To: www-tag@w3.org > Subject: Re: Sticking another fork in the URI issue: equality vs. > equivalence > > > # URIs are strings with fairly strict syntax constraits. There are two > # kinds of operations defined on them: > > # 1. You can test them for equality > # 2. You can attempt to access representations of the resources they > identify. > > Statement 1 is too narrowly drawn: > 1. You can test them for equivalence > > What kind of equivalence you care about depends on > the context of use. For use as a namespace name, > the equivalence is 'string equality' (which is, > after all, the strictest equality you can have > for something defined as a string of characters). > > For use in context of operations in category '2' > above, though, the equivalence relationship is > looser, in that 'http://www.w3.org' and 'http://WWW.W3.ORG' > are equivalent. > > I think when talking about URIs you're better off > avoiding talking about 'equality' so that you can > be clear about which equivalence relationship you > expect in the context. > > Statement 2 is also too narrowly drawn; it would > be better to write it as > 2. You can attempt to interact with the resources they > identify. > > That would cover 'mailto' and POST as well as 'http' and GET. > > # Speaking as the implementor of two of the largest web robots ever > # written and one commercial internet search engine, I am intimately > # familiar with investing vast numbers of CPU cycles in manipulating and > # storing and indexing and retrieving and caching and queuing HTTP URIs as > # names, because that's what they are, names. > > Surely cache implementors use a different equivalence > algorithm than namespace name implementors, e.g., > http://wwww.w3.org equivalent to http://www.w3.org:80, > etc. > > Larry > -- > http://larry.masinter.net
Received on Saturday, 19 October 2002 17:25:40 UTC