Re: IDNA reference (Issue #16) from Martin J. Dürst on 2010-10-17 (public-iri@w3.org from October 2010)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Sun, 17 Oct 2010 12:58:55 +0900
To: John C Klensin <john-ietf@jck.com>
CC: Wil Tan <wil@dready.org>, Julian Reschke <julian.reschke@gmx.de>, public-iri@w3.org
Message-ID: <4CBA747F.9040003@it.aoyama.ac.jp>
I have applied the patch that I put out around the end of September, 
because I have received positive feedback from John, and no negative 
feedback. I have committed this to subversion as
http://trac.tools.ietf.org/wg/iri/trac/changeset/20/.

I will produce a -02 draft based on the small but steady changes that 
have accumulated since draft -01. Any comments appreciated, as always.

Regards,   Martin.

On 2010/09/30 22:31, John C Klensin wrote:
>
>
> --On Tuesday, September 28, 2010 20:27 +0900 "\"Martin J.
> Dürst\""<duerst@it.aoyama.ac.jp>  wrote:
>
>>> Actually, 5890/91/92/93 and arguably the still unpublished RFC
>>> 5895.  RFC 5894 is not normative, but contains the
>>> explanations that might be more useful to some people as well
>>> as a discussion of the transition issues.
>>
>> I have added (at first unused) references to RFC 5890 and
>> 5891. I have referenced RFC 5890 here. I think it should be
>> obvious to the reader of that document that they have to look
>> at the others, too. I don't think we want to have a whole list
>> of RFCs every time there is something about IDNA, but of
>> course if there is something specific regarding one of the
>> other documents (e.g. bidi,...), we'll also add a direct
>> reference to that.
>
> FWIW, this works for me.
>
>>>>> What's the right reference for ToASCII now?
>>>
>>>> The closest thing would be sections 5.1 to 5.5 of RFC 5891,
>>
>> Again, we are not looking for the actual operation, but for
>> the valitity check it provides. I think that therefore
>> U-Labels at
>> http://tools.ietf.org/html/rfc5890#section-2.3.2.1
>> are the rigth point to reference.
>
> Ok.   I suggested the operation only because ToASCII was used,
> it is definitely an operation, and the question was for a
> reference for ToASCII.
>
>>>> but simply referencing them will lead to incompatibility
>>>> (e.g. producing different A-labels from the IDNA2003
>>>> version.)
>>
>> Does it produce different A-labels? My understanding is that
>> it produces either the same A-label or no A-label, with the
>> very specific exceptions of the ς (final sigma) and ß
>> (sharp-s) only.
>
> That is correct unless we made a very serious mistake somewhere.
> In some sense, the difference between IDNA2003 and IDNA2008 is
> that the number of strings that can be processed to produce what
> we now call A-labels has decreased significantly.  But that is
> consistent with your "no A-label" case above.
>
>> ...
>>>> http://unicode.org/reports/tr46/ details a good transition
>>>> strategy, but I wonder how one could work that into iri-bis.
>>>
>>> TR46 (which is not yet a stable reference since the text is
>>> still under review and may change yet again), details a
>>> transition strategy.  But it is one that does not have IETF
>>> consensus, partially because it posits a much slower
>>> transition to allow for circumstances that are either very
>>> low frequency or that represented abuses even under
>>> pre-IDNA2008 standards and best practices.   Let's not make
>>> things more confusing by trying to reference it as if it were
>>> the only reasonable approach to the situation.
>>
>> See Michel's mail for some details. I think we have to look
>> into whether and how we can use TR46 for describing additional
>> normalization at least in the normalization section (some
>> applications such as spiders prefer to normalize as
>> aggressively as possible to reduce the possibility of fetching
>> the same thing twice).
>
> Well, RFC 5895 certainly permits normalizing as aggressively as
> one likes, it is just quite deliberately not normative.  The
> difficulty is that, once one moves beyond canonical
> normalization (NFC or NFD), and becomes more aggressive, one
> starts running into edge cases in which some names that users
> and registrants believe are different become the same,
> effectively making one of them completely inaccessible.  The two
> cases that the IDNABIS WG quite deliberately created (final
> sigma and sharp-s) after long debate are examples of this, but
> so are the notorious dotless-i problem, a number of Han
> characters that are safe to map away except when they are used
> in personal names (the latter are not PVALID today, but it is
> easy to imagine a strong case being made in the future for
> reclassifying them), the Arabic and Farsi Yeh character, a
> number of characters that represent numerals, and so on.
>
> For the IRI spec to assume or require aggressive mapping that
> goes well beyond the very conservative assumptions of RFC 5895
> (you will recall that much of the relevant descriptive text was
> moved into what is now RFC 5894) risks creating disconnects and
> inappropriate restrictions on user and registrant behavior and
> on the future evolution of IDNA.
>
>      john
>
>
>

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
Received on Sunday, 17 October 2010 03:59:44 UTC