- From: Aryeh Gregor <Simetrical+w3c@gmail.com>
- Date: Sun, 23 Aug 2009 23:12:00 -0400
On Sun, Aug 23, 2009 at 10:23 PM, Peter Kasting<pkasting at google.com> wrote: > I think telling user agents to strip leading and trailing whitespace is a > good idea. ?I'm not as sure about stripping whitespace in the middle. It seems like (some?) mail agents will do that already, if it's around a dot or the @. If I do echo 'test' | mail ' simetrical @ gmail . com ', I get the mail just fine, with the whitespace stripped from the To: header. > You said there were 202 rows total in this group. ?How many of those 202 are > "ones that possibly work"? I count 9 out of the 202 as missing an @ sign. The other 193 look like a more or less sensible address could be extracted from them. Note that this isn't a fair sample of what users actually entered, since in theory any address without an @ sign should have been rejected on the server side for the past few years (but all other addresses were allowed). I should also reiterate that this isn't necessarily a representative sample. In particular, I wouldn't be surprised if some types of invalidity (like use of non-ASCII characters, if that even slightly works -- I haven't tested) were common in particular non-English-speaking subsets of the Internet. > I ask because if it is significantly less than 202, then the failure rate > (if we strip whitespace) is noticeably less than 0.007% of your sample. ?I > am not as firmly on the side of "never reject anything conceivably valid", > probably because I think there's more of a chance of type=email obsoleting > silly JS-based validators if we do it right. I can definitely see the value in that. On the other hand, if you're one the people with a weird e-mail address, it would be a pain. I know one of the people whose local part ends in a ., as I mentioned. I've been in that position myself with +-addressing. It would be great if we had some sane standard for what e-mail addresses actually worked, but I'm not sure it's a great idea for HTML 5 to effectively mandate that a subset of addresses are invalid unless we can get all the people writing e-mail-related tools to go along. (Which we can't.) *Some* people are being issued and are using these invalid addresses, whether we like it or not. > One notable datum missing from your otherwise useful analysis is how many > _invalid_ email addresses not allowed by the current definition would be > allowed by this. ?I suspect the number is large. ?I would be willing to > trade a tiny number (<0.007%?) of false negatives to avoid a large number of > false positives, especially since I suspect that if the check were weakened > this far authors would be more likely to continue with their (currently > lousy) hand-written validators. One problem is that apparently some addresses are effectively usable even though all the standards say they're wrong. As I say, one of the addresses was like <foo. at example.com>, with a trailing . in the local part. It's prohibited by the RFC and the GNU "mail" utility rejects it, but the user with that address confirmed that he used it just fine for a long time, and he received mail that I sent to that address with Gmail. Someone else I talked with about it found that two mail servers he tested supported addresses like <"quoted string"@example.com>, but a third didn't. So it looks to me like there *is* no clear distinction between what's usable as an e-mail address and what's not, in practice. Some stuff that the RFCs prohibit mostly works, and some stuff that they allow doesn't reliably work. Given that, the only reliable way to tell whether an e-mail address is usable in practice is to just try it. HTML 5 can't possibly distinguish between a working address and a non-working address if it depends on what specific mail software the parties happen to be using. So given that either false negatives or false positives will necessarily occur, you either lock out some users or you permit some gibberish. If the only reason to be strict is to encourage authors to drop extremely broken JS checks in favor of slightly broken in-browser checks, that doesn't strike me as very compelling, to be honest. (Especially since I don't think it will necessarily work.) The only other reason I can think of is to help users avoid typos, but that's something that overridable warnings are suited to, not outright prohibitions. > I don't think this is a very valuable option because I don't > think a UA can make good UX out of it (I speak as a member > of the Chromium team who works on UX). What would the problem be here from a UX perspective? I can see problems from other perspectives, like how this creates a whole new category of not-quite-valid input values that would have to be specially treated in the spec.
Received on Sunday, 23 August 2009 20:12:00 UTC