[i18n-activity] input type=email change proposals

r12a has just created a new issue for https://github.com/w3c/i18n-activity:

== input type=email change proposals ==
4.10.5.1.5 E-mail state (type=email)
https://html.spec.whatwg.org/#e-mail-state-(type=email)

The W3C HTML 5.3 spec has already made changes to the scope of type=email forms. For history on those discussions see
- https://github.com/w3c/html/pull/1163#event-1446916121
- https://github.com/w3c/html/issues/845

This also relates to a bug at 
https://www.w3.org/Bugs/Public/show_bug.cgi?id=15489

I'm raising this issue to gather thoughts about what we should propose the WhatWG should change in their version of the spec, which doesn't include the W3C changes.

ICANN is asking for the restriction on Unicode in email to be removed, because there are users out there who have functioning email addresses which use Unicode on either or both sides of the @ sign. See a [white paper by them](https://uasg.tech/wp-content/uploads/2017/04/Unleashing-the-Power-of-All-Domains-White-Paper.pdf).  I believe they are also concerned that content developers may use type=email forms for id entry, where ids are EAI addresses.

Currently the WhatWG spec limits the internal representation of emails input using type=email to ASCII only.  Note that the clue to this being the internal representation only is the link on the word 'value', which points to a part of the spec that says:

> "A control's value is its internal state. As such, it might not match the user's current input." https://html.spec.whatwg.org/#concept-fe-value) 

As i understand the WhatWG spec, the user can type anything into such a field and the browser should use punycode to convert non-ASCII characters, on both sides of the @ sign, to ascii for storage and transmission.  The spec also says "Constraint validation: While the user interface is representing input that the user agent cannot convert to punycode, the control is suffering from bad input."

The problems i see with the current spec text are as follows:

1. punycode is only a relevant transformation for the domain name, not for Unicode text on the left side of the @. I think this needs to be clarified in the spec.

2. Furthermore, conversion of the left side is not mentioned, although some transformation is apparently required in order to convert Unicode characters to ascii internally. (Given that the spec specifically mentions a punycode transformation for the IDN (which is useful because it is a standardised approach), it seems to me that it would be equally useful to specify the transformation to be applied to the left side for conversion to ascii (eg. percent-encoded utf-8), if ascii is actually needed.)

3. I still have a question in my mind about whether it is actually necessary, or indeed appropriate, to transform the left-hand side to ascii.  I don't know enough about email addresses to answer that question.

4. It seems that in general browsers are not following the spec, since they are not behaving as expected if the user types email addresses containing Unicode into the form field.  During TPAC i created some small tests[1] that show browsers preventing users actually using Unicode in email addresses for type=email fields.  Presumably, one of two things should be done in that case: (a) change the spec to match browser behaviour, or (b) raise bugs against the browsers to get them to conform to the spec. The former approach would take us in the opposite direction from what ICANN wants.

5. In the bugzilla but linked to above, people such as John Klensin are arguing that the browser shouldn't concern itself with converting the form entry anyway, since email systems do that.

6. Others in the bugzilla thread suggest that there should be different types of form, ie. type=email that accepts EIA, and a type=ascii-only that people can use if they have a particular reason for limiting to ascii.

7. I assume that if a user types an internationalized email address in a field that is looking for an id, rather than sending email, then conversion to punycode or any other escaped form is not appropriate either.  Perhaps you will say that the developer shouldn't have used input type=email in this case.  If so, ...

8. ... i would argue that the scope of use for this form field type really needs to be made much clearer in the spec, so that developers are clearer about when and when not to use it. 


I'd like to see the spec updated to take into account the relevant points above, but regardless of any of those changes, i'd also like the spec to carry a (probably informative) description of when type=email should and should not be used (and if the expectation is that content developers should use vanilla input forms for certain things, advice to that effect).

It would probably also be useful for ICANN to express their use cases as part of the discussion.


-----------------
[1] Tests:

- https://w3c.github.io/i18n-tests/quick-tests/email-forms/email-forms-000.html
- https://w3c.github.io/i18n-tests/quick-tests/email-forms/email-forms-001.html
- https://w3c.github.io/i18n-tests/quick-tests/email-forms/email-forms-002.html

An address like ascii@ascii.com causes the browser to behave as expected on the 4 major browsers.

The address abc@सम्यूर्ण.com is blocked on Chrome, Safari, and Edge, but does work on Firefox.

The address सम्यूर्ण@सम्यूर्ण.com is blocked on FF, Chrome, Safari, or Edge.

Note that Chrome produces an error message that specifically points to non-ASCII characters being unacceptable in email addresses. 


---
**WHEN CREATING A NEW ISSUE DO SO ABOVE THIS PARAGRAPH, REPLACING THE PROMPTS, BUT LEAVE THIS PARAGRAPH INTACT AS WELL AS THE TEXT BELOW IT** When this issue is raised in the github/bugzilla/mail of the WG that owns the spec, use the text above this para as the basis for that comment. Then edit this issue to remove this paragraph and ALL THE TEXT ABOVE IT. Replace the text 'link_to_issue_raised' below with a link to the place you raised the issue, but leave the remaining text below this para unaltered.

**This is a tracker issue.** Only discuss things here if they are i18n WG internal meta-discussions about the issue. **Contribute to the actual discussion at the following link:**


§ link_to_issue_raised



Please view or discuss this issue at https://github.com/w3c/i18n-activity/issues/607 using your GitHub account

Received on Thursday, 8 November 2018 12:59:51 UTC