- From: Smylers <Smylers@stripey.com>
- Date: Mon, 24 Aug 2009 09:36:12 +0100
Aryeh Gregor writes: > Historically, MediaWiki has mostly just required that an @ symbol be > present in the address. Originally we used a simplistic regex, It's relatively well known that a simple regex can't be used to match e-mail addresses (and not match things that aren't!); Jeffrey Friedl's 'Mastering Regular Expressions' (O'Reilly) included a pattern for this over a decade ago, but it is exceedingly long: http://groups.google.co.uk/group/comp.lang.perl.misc/msg/603ba6fc642a3124 http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html > ... but when users complained, we looked into the RFCs and decided it > was too complicated to bother with validation beyond checking for an @ > sign. It's too complicated for most developers to roll their own validation, but there are standard libraries available which get it right. > ... I decided to do some research on how many users' addresses would > be invalidated [by HTML 5's validation] ... > > 1) Addresses in the form "foo <bar at baz.example>", or similar. These > mostly match RFC 5322's name-addr production instead of addr-spec Forms on websites capturing users' e-mail addresses typically want just the address part, prompting for the human-readable name in a separate box, so I think HTML 5's <input type=email> not allowing the above is helpful. > 2) Addresses with dots in incorrect places, in either the local part > or the domain name part. For instance, multiple consecutive dots, or > leading/trailing dots. These don't match RFC 5322 at all AFAICT, but > I asked one of the users with an invalid address of the form > <foo. at example.com>, and he said it worked fine for him. GNU mail gave > a syntax error when I tried to send mail to that address, but Gmail > sent it without complaint, and the user received it successfully. There may actually be several categories of oddly placed dots. While the address in the form you give above works it may be, say, that those with repeated dots in the hostname part don't work. On the specific case of a . immediately before the @, I've seen that before: this Perl library module extends an RFC-compliant module to allow just that; its author admits ".@" breaks the RFCs but claims such breakage is useful in the real world, specifically when dealing with e-mail addresses for Japanese mobile phones: http://search.cpan.org/perldoc?Email::Valid::Loose That somebody has found this to be a sufficiently widespread problem with standard Perl e-mail address validation to write and upload a module which 'fixes' this (and just that; it makes no other changes) suggests that people will find HTML 5's <input type=email> to be problematic in precisely the same way. > There were other types of addresses that didn't meet HTML 5's > specification after whitespace was stripped, but none with more than a > single-digit number of addresses occurring in the sample of three > million or so that I looked at. So it may actually be that there isn't a general problem here of lots of real-world e-mail addresses which work but don't comply with the RFCs; it may simply be the one case of ".@"? There aren't a plethora of Email::Valid extensions which relax various different criteria; just the one which allows ".@". > Alternatively, you could just loosen the restrictions even further, > and only ban input that doesn't contain an @ sign. (Or that doesn't > match ^[^@]+@[^@]+\.[^@]+$, or whatever.) Or just don't ban anything > at all, like with type=tel. type=email differs from most of the other > types with validity constraints (like month, number, etc.) in that the > difference between valid and invalid values is a purely pragmatic > question (what will actually work?) that the user can often answer > better than the application. It doesn't seem like a good idea for the > standard to tell users that the e-mail addresses they've actually been > using are invalid. Users often mis-type e-mail addresses. It seems useful to be able to trap as many typos as possible. Many authors obviously believe this, given how many employ JavaScript validators. If HTML 5 were overly permissive about <input type=email> then it's likely such authors would continue to use homegrown JavaScript solutions, which slightly defeats the purpose of HTML 5 introducing <input type=email). Smylers
Received on Monday, 24 August 2009 01:36:12 UTC