Re: [i18n-activity] input type=email change proposals (#607) from klensin via GitHub on 2018-12-07 (public-i18n-archive@w3.org from October to December 2018)

From: klensin via GitHub <sysbot+gh@w3.org>
Date: Fri, 07 Dec 2018 00:40:59 +0000
To: public-i18n-archive@w3.org
Message-ID: <issue_comment.created-445082339-1544143258-sysbot+gh@w3.org>

Richard, I hope what I'm about to say has been clear from what I have said on the calls or previous discussions of related topics, but, just to get a comment on your "problems i see with the current spec text" points 2 and 3 into the file...

The specs produced by the IETF EAI WG (formally called, as those documents specify, "SMTPUTF8" because "EAI" is just the name of a now-closed WG) are extremely explicit that transformation of a local-part (the left side of the "@") to an all-ASCII form is not only not a requirement but prohibited except as part of the final delivery process. There is not only no need to convert to an internal ASCII form, there is no way to do so. Because many other aspects of mail addresses, headers, of handling interact with having non-ASCII local parts, an email origination and delivery path either entirely support SMTPUTF8 or they don't. And, if they don't, and with the understanding that this isn't anything that can be tested lexically, the mail won't go through.

The barrier to any sort of ASCII-compatible encoding of the local part is that the mail transport protocol, SMTP, has been extremely flexible about the local part since its first stable version as RFC 821 in 1982. At least one reason for the flexibility is that, since nearly the dawn of the ARPANET, email has been used to transport information other than interpersonal messages for humans and local-parts, as well as subject lines. have been used to carry instructions or metadata. Local parts that encapsulate the addresses of completely different mail systems on the other side of gateways pose similar problems. So, for example, some email systems are quite sure they know what a "%" means and it has to do with message routing, not hexadecimal encoding of inconvenient characters. Similarly, a hyphen or two, slashes, colons, etc., are as or more likely to be pieces of a command line for some system as an indication of a special encoding. The delivery system can interpret those characters any way it likes but there is no general way for an originating or relaying system to guess accurately at what the delivery system will do.

So the answer to your question about whether it "is actually necessary, or indeed appropriate, to transform the left-hand side to ascii" is "neither necessary nor appropriate".

john

--
GitHub Notification of comment by klensin
Please view or discuss this issue at https://github.com/w3c/i18n-activity/issues/607#issuecomment-445082339 using your GitHub account

Received on Friday, 7 December 2018 00:41:00 UTC