- From: Larry Masinter <LMM@acm.org>
- Date: Sun, 20 Oct 2002 10:55:38 -0700
- To: <www-tag@w3.org>
# I hope TAG is paying attention. Larry explains clearly why it is folly # to say "HTTP URLs work fine as identifiers". The fact is, *all* URIs # are broke as identifiers, and HTTP bears a good part of the # responsibility. I suppose I should be happy that Joshua says he agrees with me, but I'm not sure I agree with his short characterization of my position. I don't think it is folly to say that 'HTTP URLs work fine as identifiers', since I think they work fine as identifiers for resources you access by using the HTTP protocol; there are other things that they don't work as well for (shining shoes, desert topping, floor wax). I don't think I agree with the statements listed as "The fact is," I don't think '*all* URIs are 'broke as identifiers', nor that HTTP bears any particular responsibility. If I'd lay any responsibility, it would be on the attempts in the various URI specifications to not limit the scope of applicability of URIs without having any guidance to give on how the extension to semantic identifiers could be accomplished. # Already the W3C has a massive schism in the way URIs # (*all* URIs, not just http) are used as identifiers in # two of the most widely-deployed technologies (http and XML). I think the 'schism' alluded to isn't particularly between http and XML, since the use of URI as networked object identifier applies as well to any URI used in HTML (whether http:, ftp:, gopher:, file:, or cid:), and the use of URI as string-equality-identifier applies to any URI used in an 'xmlns' term, whether the URI scheme is http:, urn:, or dav:. Is it a "schism", or is it just two different applications that we have to keep straight? # I really wish someone would pay attention to this and DO SOMETHING ABOUT # IT rather than waste time with the "HTTP URLs can identify anything" # politicians. Consensus-making is a political process, and W3C is a consensus organization. With a little help, HTTP URLs can identify anything. However, they need a little help; some bit of context that makes it clear that the URL is being used as a string-equality-identifier rather than a network-object-identifier. "xmlns=" is one kind of context. 'urn:tdb:2002:' (http://larry.masinter.net/duri.html) is another kind of context. In natural language, the context isn't explicit (http://lists.w3.org/Archives/Public/www-tag/2002Aug/0355.html) How about 'with a little help, HTTP URLs can identify anything, but by themselves, they only identify network resources'. Other pieces of context (like 'tdb' or 'xmlns=' or topic maps) provide the additional help that lets you do (in a system) what is natural to do in natural language, to transfer meaning so that the "indicator" is used as a shorthand to mean the thing indicated. In Semantic Web, though, the context needs to be explicit, so that we can make assertions about the W3C, the W3C's web server, and the home page at the W3C's web server, even though all of those things may be indicated (by context) when someone writes 'http://www.w3.org' in free text. # A) The standards must offer coherent, consistent, unified, and # completely unambiguous guidance regarding how to compare any two URIs # for equivalence. We have ample evidence that different applications need different equivalence relationships. 'coherent, consistent, unified and completely unambiguous guidance' would have to include the possibility that different applications might use different algorithms. And it's also important to note that there are at least two syntactic spaces (URIs, IRIs). I suppose this is an issue for the revision of RFC 2396, to expand on how to compare URIs. # a) No reliance on "convention" for things like UTF-8 encoding, # etc. There are way too many different and often contradictory # conventions available for edge cases like internationalization. I don't think we should ever rely on "convention" without "specification". The W3C-I18N group schedule for completion of the IRI draft is aggressive, and I'm sure they can use some help. http://www.w3.org/International/Group/iri-edit/draft-duerst-iri.txt comments to www-i18n-comments@w3.org. # b) Absolutely no allowance for specs to compare differently than # other specs. If HTTP identifies two things to be the same, XMLNS had # better as well. I don't think this is a good idea. The problem is that "xmlns" needs a stable algorithm that can be implemented and be guaranteed to apply for all time, while different URI schemes may well define other equivalence relationships that apply. Of course, for the purpose of the HTTP protocol, http://www.w3.org and http://WWW.W3.ORG and http://wWw.W3.oRg:80 are equivalent. But there's no way to code all of the current and potential future equivalence relationships into an implementation of the xmlns algorithm. I think it might be a good idea to try to arrange that there can never be two namespaces, one named http://www.w3.org and another named http://wWw.W3.oRg:80/ , so that the difference in equivalence relationships is moot, but I don't quite know how to arrange that. # c) No "guessing" -- if an implementer does not have unambiguous # and exact rules for comparing any two particular URIs, for God's sake # don't allow the implementer to report an equivalence one way or the # other. "Don't know" is the only legitimate answer. Each application needs to define its equivalence relationship, but sometimes there really is legitimate use of variance. For example, it's up to the web server at 'example.org' to decide whether http://example.org/CASEEXAMPLE and http://example.org/caseExample are equivalent. A client that actually knows (by some side channel) the case insensitivity of the server it's talking to might legitimately infer that the two are equivalent, while another client might think that they're not equivalent. But it's not required (in most of the web) to actually know about equivalence. # d) Relative URIs -- this goes with "c". Everywhere that # implementations or specs permit comparison of relative URIs without # explicitly knowing the base and having explicit and universal rules # regarding universalizing the relative URI, the spec should prohibit # reporting any comparison. Are there any specs that allow comparison of relative URIs without explicitly knowing the base? I think that indeed would be a bad idea. On the other hand, I do think that RFC 2396 is clear that there is only one, explicit and universal algorithm for combining a base and a relative URI to create an absolute URI. # B) The standards must offer coherent, consistent, unified, and # completely unambiguous guidance regarding how to compare any two qnames # for equivalence. The limitations of 'unambiguous guidance' for URI equivalence might not apply to qnames, because qnames might not allow all URIs and might supply enough context. # Personally, I think this is the most important issue facing WWW and is # eminently in the realm of what TAG should be addressing. In fact, I # would say that TAG could spend a year addressing just these two issues # and be considered wildly successful. The TAG charter is at http://www.w3.org/2001/07/19-tag. The primary purpose of the TAG is to document cross-technology principles in architectural recommendations, and resolving issues of architectural impact is secondary. So spending a whole year on two issues wouldn't seem to be a "wild success", even if they were actually the most pressing ones facing web architecture. Perhaps the TAG might be as successful shine on some of these issues, as some of them are unsolvable. Cope with ambiguity, and get on with it. Separately, I want to argue that http://www.w3.org/TR/webarch/ could be split into separate documents that can progress and be updated independently. Defining equivalence and settling identification vs. indication seems necessary to finishing section 2 ("Identifiers and resources"), but the other sections could also make progress. Larry -- http://larry.masinter.net
Received on Sunday, 20 October 2002 13:55:09 UTC