- From: Martin Duerst <duerst@w3.org>
- Date: Thu, 25 Jul 2002 04:36:54 +0900
- To: "Ian B. Jacobs" <ij@w3.org>, www-tag@w3.org
- Cc: w3c-i18n-ig@w3.org, www-i18n-comments@w3.org
At 18:23 02/07/23 -0400, Ian B. Jacobs wrote:
> 2.6 URIEquivalence-15
>
> 1. Status of URIEquivalence-15. Relation to
> Character Model of the Web (chapter 4)? See text
> from TimBL on URI canonicalization and email from
> Martin in particular.
>
> TB: This is serious. Martin seems to be saying
> "deal with it"
Yes, exactly. Thanks!
> DC: Two reasons:
>
> 1. The only way you can be sure that a consumer
> will notice that you mean the same thing is
> that you've spelled it the same way. I think
> that they're not wrong. Nothing wrong with
> string compare.
> 2. In general, it's an art to gather that
> something spelled differently means the same
> thing.
>
> TB: If we believe that, should there be a
> recommendation that "when you do this, only
> %-escape when you have to, and use lowercase
> letters." Where should that be written?
> DC: Shortest path to target is the I18N WG.
> RFC 2396 applies equally to all URI schemes.
> Generating absolute from relative URI is not
> scheme-specific.
> DO: There are absolutization scheme(s) and
> things like scheme-specific rules (e.g.,
> generating an absolute) and we should take
> this into account when we talk about doing a
> string compare.
> RF: Different issues here. There is a syntax
> mechanism to go from rel URI to abs URI. But
> no scheme-specific semantics on that. There
> are scheme-specific fields (e.g,. host name)
> that have equivalence rules. It boils down to
> this: the most efficient way to deal with
> these cases is to require a canonical form and
> compare by bytes.
>
> [DanC]
> There's stuff like http://www.w3.org:80/ and
> http://www.w3.org/ , which are specified, in a
> scheme-specific manner, to mean the same
> thing.
>
> [Ian]
> DO: So, canonicalize according to scheme and
> generic rules, then compare.
> RF: The only entity that does the
> canonicalization is the URI generator; not at
> comparison time. Inefficient to canonicalize
> at compare time.
>
> [Ian]
> RF: Making a URI absolute is
> scheme-independent. That's required so we can
> add schemes later on.
> DC: There was a backlash in the XML community
> about saying absolutize.
> TB: That was a different issue.
> DC: I don't understand the difference.
> DO: Namespaces used as identifiers rather than
> for dereferencing. Requiring absolute URIs was
> meant to facilitate authoring.
> TB: I hear people arguing that string
> comparison necessary. I think there needs to
> be a statement of principle to get good
> results:
>
> 1. Don't use %-escape unless you have to.
> 2. Yse lowercase when doing so.
>
> TB: Where do we take these suggestions?: (a)
> We have a section on the arch doc on comparing
> URIs or (b) ask I18N WG to deal with this.
> RF: Or add a stronger suggestion to the URI
> spec itself.
> TB: That's a wonderful answer!
> RF: I can add this to the issues list (section
> on URI canonicalization). I can't promise that
> it will be answered there.
I think it belongs in an updated version of the URI spec.
But because it's of particular importance for IRIs, and
because I think the IRI spec will move ahead before the
revision of the URI spec, I have added something in the
editing version of the IRI spec.
(see http://www.w3.org/International/Group/iri-edit/
for those who have member access):
>>>>
2) Convert each octet to %hh, where hh is the hexadecimal
notation of the octet value. Note: This is identical to
the escaping mechanism in Section 2.4.1 of [RFC2396].
Note: To reduce variability, the hexadecimal notation
should use lower case letters.
>>>>
This earlier read:
<<<<
2) Convert each octet to %HH, where HH is the hexadecimal
notation of the octet value. Note: This is identical to
the escaping mechanism in Section 2.4.1 of [RFC2396].
<<<<
Any comments appreciated.
("1. Don't use %-escape unless you have to." is already covered.)
Regards, Martin.
> DC: I don't think we should punt this
> entirely. For URIs, it's fine to do string
> compare. For URI references, it's fine to
> absolutize and then do string compare. That
> works for me.
> SW: I agree with TB that we should have
> something in arch doc. That should be in sync
> with the emerging URI spec.
> DO: How about as little as "there are good
> rules for doing this; go see the URI spec and
> the IRI specs for more info..."
>
> [DanC]
> "Can the same resource have different URIs?
> Does http://WWW.EXAMPLE/ identify the same
> resource as http://www.example/?"
> -- FAQ on URIs
>
> [Ian]
> DC: Is it useful to do a finding in the mean
> time?
> IJ: I hope to harvest from Dan's FAQ.
> TB: I think that if in arch doc, probably
> don't need a finding.
> Action IJ: Harvest from Dan's FAQ for arch
> document.
>
> Resolved: the Arch Doc should mention this issue.
Received on Wednesday, 24 July 2002 15:44:16 UTC