[Bug 15489] IDN email addresses should be converted to Punycode before validating them

https://www.w3.org/Bugs/Public/show_bug.cgi?id=15489

Norbert Lindenberg <w3-bugs@norbertlindenberg.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |public-i18n-core@w3.org

--- Comment #15 from Norbert Lindenberg <w3-bugs@norbertlindenberg.com> 2012-05-14 23:26:58 UTC ---
I don't agree with the statement "IDN is only a rendering-level/UI-level
feature", and think that internationalized domain names should be allowed in
email addresses in the value attribute of <input> elements.

IDNA (its full name, with the "A" standing for "applications") was designed to
enable the use of full Unicode in domain names within applications, while
providing a mapping to an ASCII form for use with older protocols that aren't
IDNA-aware (e.g., DNS and SMTP).

Applications generally benefit from using the plain Unicode form of strings
wherever possible. Older protocols and file formats require a variety of
ASCII-based transformations of Unicode - e.g., the string "中国" might show up as
"xn--fiqs8s", "%E4%B8%AD%E5%9B%BD", "\u4E2D\u56FD", "&#20013;&#22269;". Keeping
these around and storing them in databases tends to cause problems - searching
and sorting don't work properly because comparison functions don't know that
"xn--fiqs8s" and "%E4%B8%AD%E5%9B%BD" mean the same, and duplicate or missing
decoding later on can lead to mojibake. To maintain sanity, applications are
better off converting text to plain Unicode when they receive it, and
converting it to the appropriate ASCII-based transformations only when passing
it on to a service that doesn't support Unicode (such as addresses for SMTP).

The question here then is whether the email address in the value attribute of
the <input> element with type=email should be part of the Unicode-aware
application world, or part of the dumb ASCII-only protocol world. In a similar
situation, it's already been decided that the URLs in the href attribute of the
<a> and <link> elements, as well as the src attributes of the <script> and
<img> elements, can be IRIs and thus include internationalized domain name
labels.

I don't see why the same shouldn't be allowed for the value attribute of the
<input> element with type=email.

As a consequence, user agents then *must* convert email addresses that contain
IDN labels to the equivalent ASCII form before validating the addresses based
on their ASCII form specification.

Note also that the usage of the word "punycode" in the spec is wrong - Punycode
is just one function of several used in the conversion from a U-label to an
A-label:
http://tools.ietf.org/html/rfc5890#section-2.3.4

-- 
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

Received on Monday, 14 May 2012 23:27:01 UTC