Re: IDNA reference (Issue #16) from Martin J. Dürst on 2010-09-28 (public-iri@w3.org from September 2010)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Tue, 28 Sep 2010 20:27:19 +0900
To: John C Klensin <john-ietf@jck.com>
CC: Wil Tan <wil@dready.org>, Julian Reschke <julian.reschke@gmx.de>, public-iri@w3.org
Message-ID: <4CA1D117.5080103@it.aoyama.ac.jp>
On 2010/09/08 0:06, John C Klensin wrote:
>
>
> --On Tuesday, September 07, 2010 10:14 PM +1000 Wil Tan
> <wil@dready.org>  wrote:
>
>> On Tue, Sep 7, 2010 at 7:41 PM, Julian Reschke
>> <julian.reschke@gmx.de>wrote:
>>
>>> Hi,
>>>
>>> <http://tools.ietf.org/html/draft-ietf-iri-3987bis-01#section
>>> -3.4>:
>>>
>>>    Replace the ireg-name part of the IRI by the part converted
>>>    using the ToASCII operation specified in Section 4.1 of
>>>    [RFC3490] on each dot- separated label, and by using U+002E
>>>    (FULL STOP) as a label separator, with the flag
>>>    UseSTD3ASCIIRules set to FALSE, and with the flag
>>>    AllowUnassigned set to FALSE.  The ToASCII operation may
>>>    fail, but this would mean that the IRI cannot be resolved.
>>>    In such cases, if the domain name conversion fails, then
>>>    the entire IRI conversion fails.  Processors that have no
>>>    mechanism for signalling a failure MAY instead substitute
>>>    an otherwise invalid host name, although such processing
>>>    SHOULD be avoided.
>>>
>>> In August, RFC 3490 has been obsoleted by RFC 5890/91.
>
> And there is no "ToASCII" operation any more, so any such
> sentence will need rewriting, not just an updated reference.
> "Producing A-labels" (below) is better terminology.

Producing A-labels is better terminology if the actual operation is 
required. But what is required in the above text is validation. I think 
this is covered by saying that the relevant IDN has to be an U-Label.

This is done in the patch (patch16a.txt) that I have attached to issue 
#16 and to this mail. Everybody, please check/comment.

> Actually, 5890/91/92/93 and arguably the still unpublished RFC
> 5895.  RFC 5894 is not normative, but contains the explanations
> that might be more useful to some people as well as a discussion
> of the transition issues.

I have added (at first unused) references to RFC 5890 and 5891. I have 
referenced RFC 5890 here. I think it should be obvious to the reader of 
that document that they have to look at the others, too. I don't think 
we want to have a whole list of RFCs every time there is something about 
IDNA, but of course if there is something specific regarding one of the 
other documents (e.g. bidi,...), we'll also add a direct reference to that.

>>> What's the right reference for ToASCII now?
>
>> The closest thing would be sections 5.1 to 5.5 of RFC 5891,

Again, we are not looking for the actual operation, but for the valitity 
check it provides. I think that therefore U-Labels at
http://tools.ietf.org/html/rfc5890#section-2.3.2.1
are the rigth point to reference.

>> but simply referencing them will lead to incompatibility (e.g.
>> producing different A-labels from the IDNA2003 version.)

Does it produce different A-labels? My understanding is that it produces 
either the same A-label or no A-label, with the very specific exceptions 
of the ς (final sigma) and ß (sharp-s) only.

> We have been recommending saying something like "... IDNA as
> described in RFC 5890 and the companion documents to which it
> points".

That seems like a very good phrase to use.

>> http://unicode.org/reports/tr46/ details a good transition
>> strategy, but I wonder how one could work that into iri-bis.
>
> TR46 (which is not yet a stable reference since the text is
> still under review and may change yet again), details a
> transition strategy.  But it is one that does not have IETF
> consensus, partially because it posits a much slower transition
> to allow for circumstances that are either very low frequency or
> that represented abuses even under pre-IDNA2008 standards and
> best practices.   Let's not make things more confusing by trying
> to reference it as if it were the only reasonable approach to
> the situation.

See Michel's mail for some details. I think we have to look into whether 
and how we can use TR46 for describing additional normalization at least 
in the normalization section (some applications such as spiders prefer 
to normalize as aggressively as possible to reduce the possibility of 
fetching the same thing twice).

Regards,    Martin.

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
Attachments

text/plain attachment: patch16a.txt
Received on Tuesday, 28 September 2010 11:29:24 UTC