- From: Martin Duerst <duerst@w3.org>
- Date: Thu, 25 Jul 2002 04:36:54 +0900
- To: "Ian B. Jacobs" <ij@w3.org>, www-tag@w3.org
- Cc: w3c-i18n-ig@w3.org, www-i18n-comments@w3.org
At 18:23 02/07/23 -0400, Ian B. Jacobs wrote: > 2.6 URIEquivalence-15 > > 1. Status of URIEquivalence-15. Relation to > Character Model of the Web (chapter 4)? See text > from TimBL on URI canonicalization and email from > Martin in particular. > > TB: This is serious. Martin seems to be saying > "deal with it" Yes, exactly. Thanks! > DC: Two reasons: > > 1. The only way you can be sure that a consumer > will notice that you mean the same thing is > that you've spelled it the same way. I think > that they're not wrong. Nothing wrong with > string compare. > 2. In general, it's an art to gather that > something spelled differently means the same > thing. > > TB: If we believe that, should there be a > recommendation that "when you do this, only > %-escape when you have to, and use lowercase > letters." Where should that be written? > DC: Shortest path to target is the I18N WG. > RFC 2396 applies equally to all URI schemes. > Generating absolute from relative URI is not > scheme-specific. > DO: There are absolutization scheme(s) and > things like scheme-specific rules (e.g., > generating an absolute) and we should take > this into account when we talk about doing a > string compare. > RF: Different issues here. There is a syntax > mechanism to go from rel URI to abs URI. But > no scheme-specific semantics on that. There > are scheme-specific fields (e.g,. host name) > that have equivalence rules. It boils down to > this: the most efficient way to deal with > these cases is to require a canonical form and > compare by bytes. > > [DanC] > There's stuff like http://www.w3.org:80/ and > http://www.w3.org/ , which are specified, in a > scheme-specific manner, to mean the same > thing. > > [Ian] > DO: So, canonicalize according to scheme and > generic rules, then compare. > RF: The only entity that does the > canonicalization is the URI generator; not at > comparison time. Inefficient to canonicalize > at compare time. > > [Ian] > RF: Making a URI absolute is > scheme-independent. That's required so we can > add schemes later on. > DC: There was a backlash in the XML community > about saying absolutize. > TB: That was a different issue. > DC: I don't understand the difference. > DO: Namespaces used as identifiers rather than > for dereferencing. Requiring absolute URIs was > meant to facilitate authoring. > TB: I hear people arguing that string > comparison necessary. I think there needs to > be a statement of principle to get good > results: > > 1. Don't use %-escape unless you have to. > 2. Yse lowercase when doing so. > > TB: Where do we take these suggestions?: (a) > We have a section on the arch doc on comparing > URIs or (b) ask I18N WG to deal with this. > RF: Or add a stronger suggestion to the URI > spec itself. > TB: That's a wonderful answer! > RF: I can add this to the issues list (section > on URI canonicalization). I can't promise that > it will be answered there. I think it belongs in an updated version of the URI spec. But because it's of particular importance for IRIs, and because I think the IRI spec will move ahead before the revision of the URI spec, I have added something in the editing version of the IRI spec. (see http://www.w3.org/International/Group/iri-edit/ for those who have member access): >>>> 2) Convert each octet to %hh, where hh is the hexadecimal notation of the octet value. Note: This is identical to the escaping mechanism in Section 2.4.1 of [RFC2396]. Note: To reduce variability, the hexadecimal notation should use lower case letters. >>>> This earlier read: <<<< 2) Convert each octet to %HH, where HH is the hexadecimal notation of the octet value. Note: This is identical to the escaping mechanism in Section 2.4.1 of [RFC2396]. <<<< Any comments appreciated. ("1. Don't use %-escape unless you have to." is already covered.) Regards, Martin. > DC: I don't think we should punt this > entirely. For URIs, it's fine to do string > compare. For URI references, it's fine to > absolutize and then do string compare. That > works for me. > SW: I agree with TB that we should have > something in arch doc. That should be in sync > with the emerging URI spec. > DO: How about as little as "there are good > rules for doing this; go see the URI spec and > the IRI specs for more info..." > > [DanC] > "Can the same resource have different URIs? > Does http://WWW.EXAMPLE/ identify the same > resource as http://www.example/?" > -- FAQ on URIs > > [Ian] > DC: Is it useful to do a finding in the mean > time? > IJ: I hope to harvest from Dan's FAQ. > TB: I think that if in arch doc, probably > don't need a finding. > Action IJ: Harvest from Dan's FAQ for arch > document. > > Resolved: the Arch Doc should mention this issue.
Received on Wednesday, 24 July 2002 15:44:16 UTC