W3C home > Mailing lists > Public > public-iri@w3.org > November 2011

Re: [iri] #44: Reference Unicode TR 46, and if yes, how?

From: Mark Davis ☕ <mark@macchiato.com>
Date: Thu, 10 Nov 2011 23:16:14 -0800
Message-ID: <CAJ2xs_HEqW6tei2qputabT9TRZ4=wktdxBcjkJoH1XzQsZgFZw@mail.gmail.com>
To: Chris Weber <chris@lookout.net>
Cc: public-iri@w3.org
It really entirely depends on what IRIs are being used for, and what degree
of backwards compatibility is needed. It would break compatibility with
many to most existing implementations to restrict them to IDNA2008. For
example, currently an IRI can be of the form
"http://ÖBB.at<http://xn--bb-eka.at>",
and programs (IE, Chrome,... Search Engines, etc.) expect that to be valid.

If, as John said, an IRI is *only* used to express a canonical form, so
that it is ok that "http://ÖBB.at <http://xn--bb-eka.at>" and "
https://mail.google.com//mail" and so on are illegal IRIs, then it would be
fine* to restrict IRIs to IDNA2008.

(* mostly fine. IDNA2008 does not guarantee backwards compatibility when
used with different versions of Unicode,  unfortunately. Luckily there is
only one character so far that used to be valid under IDNA2008 but is no
longer, and that character is fairly obscure.)

Mark
*— Il meglio è l’inimico del bene —*
*
*
*
[https://plus.google.com/114199149796022210033]
*



On Thu, Nov 10, 2011 at 21:25, Chris Weber <chris@lookout.net> wrote:

> On 11/9/2011 3:26 PM, Peter Saint-Andre wrote:
>
>> On 11/9/11 4:14 PM, John C Klensin wrote:
>>
>>> Peter,
>>>
>>> Let me say that a little more strongly.  URIs and IRIs need to
>>> be in some sort of reduced canonical form or basically all hope
>>> of comparing them (including for caching purposes) without some
>>> rather complicated algorithm disappears.  To the extent to which
>>> they are a good idea at all, mapping procedures like UTR 46 and
>>> RFC 5895 are useful for providing users with more convenience
>>> and flexibility.  But, to the extent to which URIs and IRIs are
>>> going to be used between systems, used to identify cached
>>> content, etc., they just don't belong in them.   Worse, neither
>>> UTR 46 nor RFC 5895 (especially the former) are general-purpose
>>> mapping/ equivalence routines.  They are specific to IDNA and,
>>> to a considerable measures, motivated by a desire to smooth out
>>> IDNA2003 ->  IDNA2008 transition.
>>>
>>
>> <hat type='individual'/>
>>
>> You're preaching to the choir. :)
>>
>> I see no reason to reference either UTR 46 or RFC 5895 in 3987bis, but
>> other WG participants might disagree.
>>
>> Peter
>>
>
> It sounds like you both agree, and after reading through the original
> thread started by Julian <http://lists.w3.org/Archives/**
> Public/public-iri/2010Sep/**0010.html<http://lists.w3.org/Archives/Public/public-iri/2010Sep/0010.html>>
> it seems this was originally a question for the Section 3.4 Mapping
> ireg-name, which has since been corrected.   The topic of canonicalization
> has been moved along with IRI comparison to draft-ietf-iri-comparison.
>
> Best regards,
> Chris Weber
>
>
Received on Friday, 11 November 2011 07:16:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 April 2012 19:52:04 GMT