[whatwg] Comments on the definition of a valid e-mail address from Peter Kasting on 2009-08-24 (public-whatwg-archive@w3.org from August 2009)

From: Peter Kasting <pkasting@google.com>
Date: Sun, 23 Aug 2009 19:23:49 -0700
Message-ID: <d62cf1d10908231923l443083e4n503c094d4bc5d681@mail.gmail.com>
Thanks very much for this analysis!

On Sun, Aug 23, 2009 at 12:41 PM, Aryeh Gregor
<Simetrical+w3c at gmail.com<Simetrical%2Bw3c at gmail.com>
> wrote:

> Inspection showed that the overwhelming majority of the failures were
> due to the presence of excess whitespace, often a single trailing
> space, or a space inserted before or after the @ sign.  When I
> adjusted the regex to ignore those failures, I got a smaller list, 202
> (about 0.007% of the total):


I think telling user agents to strip leading and trailing whitespace is a
good idea.  I'm not as sure about stripping whitespace in the middle.

Some of these were clearly wrong, and shouldn't have been confirmed to
> begin with.  Some even didn't have an @ sign, so probably were
> submitted in some window when we did no validation at all (and I have
> no idea how they got confirmed).  Of the ones that possibly work,


You said there were 202 rows total in this group.  How many of those 202 are
"ones that possibly work"?

I ask because if it is significantly less than 202, then the failure rate
(if we strip whitespace) is noticeably less than 0.007% of your sample.  I
am not as firmly on the side of "never reject anything conceivably valid",
probably because I think there's more of a chance of type=email obsoleting
silly JS-based validators if we do it right.

So why not have the spec say that in the case of e-mail addresses, the

browser may warn the user, but should permit them to submit the
> address anyway?  If the user is willing to override the warning, then
> it's likely that they personally know that the e-mail address works,
> e.g., because they use it.


I don't think this is a very valuable option because I don't think a UA can
make good UX out of it (I speak as a member of the Chromium team who works
on UX).

Alternatively, you could just loosen the restrictions even further,
> and only ban input that doesn't contain an @ sign.  (Or that doesn't
> match ^[^@]+@[^@]+\.[^@]+$, or whatever.)


One notable datum missing from your otherwise useful analysis is how many
_invalid_ email addresses not allowed by the current definition would be
allowed by this.  I suspect the number is large.  I would be willing to
trade a tiny number (<0.007%?) of false negatives to avoid a large number of
false positives, especially since I suspect that if the check were weakened
this far authors would be more likely to continue with their (currently
lousy) hand-written validators.


> Or just don't ban anything
> at all, like with type=tel.


I don't support this.

I think this input type has only been implemented in Opera so far.


I am mentoring a student who is writing a patch for this in WebKit as we
speak -- we were just discussing the implementation yesterday and I believe
he hopes to have it out for review tomorrow.

I don't think there are serious interoperability concerns
> with changing it at this point,


I agree, it would be fairly easy to switch if the validation algorithm were
changed.

PK
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20090823/3d2c4a6b/attachment-0001.htm>
Received on Sunday, 23 August 2009 19:23:49 UTC