- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 09 Apr 2003 19:14:53 -0400
- To: "Ian B. Jacobs" <ij@w3.org>, www-tag@w3.org, Tim Bray <tbray@textuality.com>
- Cc: public-iri@w3.org
At 18:41 03/04/07 -0400, Ian B. Jacobs wrote: >Hello, > >The minutes of the 7 Apr 2003 TAG teleconf are >available as HTML [1] and as text below. >2. Technical > > 1. [15]Architecture document > 2. [16]contentTypeOverride-24 > 3. [17]namespaceDocument-8 > 4. [18]IRIEverywhere-27 > 2.4 IRIEverywhere-27 > > * [34]IRIEverywhere-27 > + Action CL 2003/03/31: Revised position statement on use of > IRIs. See [35]email from CL on next steps. > > [34] http://www.w3.org/2001/tag/open-summary.html#IRIEverywhere-27 > [35] http://lists.w3.org/Archives/Public/www-tag/2003Apr/0033.html > [Ian] > CL: I'd like more discussion this week; deadline 21 April? Also > need to convey to Martin The conveying has been done. > the desirability of seeing > [36]http://www.w3.org/International/iri-edit/ updated to > include iDNS and published as draft 4, and to move to RFC soon. I have read that in the TAG minutes here, a week ago, and before. But I'm wondering how we are supposed to go to RFC if the TAG hasn't made up its mind, and seems to make very slow progress, if not going in circles. > [TBray] > Kanji example in > [37]http://www.tbray.org/ongoing/When/200x/2003/03/31/IRI Very good piece. While we are at it, two comments: - Google wouldn't need the additional parameters in the search URI (&ie=UTF-8&oe=UTF-8). They could just do a check on byte patterns, and recognize the extremely specific UTF-8 byte pattern, and then assume UTF-8 as the default. If we assume that 'ie' stands for input encoding and 'oe' for output encoding, they could also at least assume that they are the same, and do the right thing if only 'ie' is present. - I don't understand "This has the potential to play hell with caches and proxies and webcrawlers and XML namespaces, since you can have the same IRI ending up as a bunch of wildly different looking URIs." The only things that can end up differently is the case of the hex characters after the %. And caches and webcrawlers take that into account. As for XML namespaces, just don't change anything, and you are fine. > [Ian] > TBL: I think there is a fundamental question that has not been > resolved, and is causing a lot of tension.: Whether, > fundamentally, the fundamental string is the URI, or whether we > are dispensing with URIs in favor of Unicode strings instead. > > [Chris] > position A says IRI is a way of talking about URI > position B says IRI is the real thing and URI is a subset that > should go away And the current position is to use A for resolution, and B for identification (namespaces,...). > [Ian] > TB: "Is any URI an IRI?" I think the answer is No; I don't understand that. It has always been the case that any URI is an IRI. I don't see a need to change that. > lots of sloppiness about hexifying. > [Ian] > TB: I think IRIs need to be rigid and deterministic about when > hexifying is used, and that the mapping (to URI) process be > well-characterized. Do you want to say that unnecessarily hexified IRIs should be outlawed? I wouldn't mind to discourage unnecessary hexification, but there is no reason to outlaw it. This is very parallel to URIs, for example, http://www.tbray.org/ongoing/When/200x/200%33/0%33/%331/IRI (replacing every '3' by '%33') isn't a particularly good URI, but it's not outlawed. About 'when to hexify', the current draft is pretty clear: "as late as possible, but not later" (in some more words). About the mapping from IRIs to URIs: That's described in all detail. Some changes are needed re. IDNs, but these are already in the draft as an alternative. If you find any problem with the description, please tell us. > [Ian] > TBL: > Suppose > that identifiers are unicode strings and you don't have to > canonicalize them to compare them. There is ten years of > software using 7-bit fields for this quantity. > [Ian] > TBL: If suddenly you allow namespaces that can differ in 7-bit > systems but not in 8-bit systems. First, this is not fundamentally about 7-bit systems vs. 8-bit systems. There are some cases where URIs get transmitted over 7-bit channels (e.g. in an email labeled as charset=US-ASCII), email these days can handle 8-bit messages, and systems that really only handle 7 bits are fortunately exceptions. Squeezing special information into the 8th bit is rarely done these days. It is fundamentally a question of character repertoire (ASCII vs. Unicode). And it is important to observe that the cases where URIs are used as real *identifiers* (namespaces and RDF), thanks to W3Cs work over the past close to 10 years, there is really no problem with using Unicode as a repertoire. This is not true for resolution, but then for resolution, more equivalences than strict string equivalence have always been used, and we are not proposing to change that. > TB: I'm not sure that CL and I disagree that much. In terms of > URIs, it's a fact that everyone uses string comp in namespace > applications. We have ample evidence that this is prone and > shakey to failure, but people do it anyway. Could you please provide this 'ample evidence that this is prone and shakey to failure'? > [Ian] > TB: I think the anyURI type is underspecified. Last time I > looked, there were very few syntactic restrictions on what you > can put in an anyURI. I don't think that that's a good > solution. There are very few restrictions because there are actually very few restrictions on URI (and IRI) syntax. And there is also explicit text that implementations should not check agressively. This is one way to ensure extensibility and independence of different components, and is in line with the principle that URIs are opaque. > TBL: How about if we tell people to refer to the IRI draft. > Then we explain to people what that means: When you refer to > the IRI spec, you take on that %7e and %7E are the same. Why should IRIs have to fix what URIs didn't manage to keep? Regards, Martin.
Received on Thursday, 10 April 2003 10:07:09 UTC