- From: Smylers <Smylers@stripey.com>
- Date: Mon, 24 Aug 2009 15:42:53 +0100
Aryeh Gregor writes: > On Mon, Aug 24, 2009 at 4:36 AM, Smylers<Smylers at stripey.com> wrote: > > > It's too complicated for most developers to roll their own > > validation, but there are standard libraries available which get it > > right. > > Standard libraries available for all major languages? I'd be surprised if they weren't. > As far as I can tell from a quick search, the PHP standard library > contains no e-mail validation routines before 5.2.0 Sorry, I meant there is a "library" (meaning additional to the core language) available in a "standard" place (wherever that language's libraries are typically found); I wasn't intending to claim that "the" "standard library" of functionality which is part of a language's core distribution would include it. For PHP I Googled "email validation Pear" and found the following as the top hit. I haven't tried it, but it claims to comply to RFC822, and I'd have more faith in it than the average home-rolled attempt: http://pear.php.net/package/Validate/ > > Forms on websites capturing users' e-mail addresses typically want > > just the address part, prompting for the human-readable name in a > > separate box, so I think HTML 5's <input type=email> not allowing > > the above is helpful. > > It might be more helpful if they stripped the part outside the angle > brackets, but I agree that it's reasonable to just reject these. Good point. And that's largely a UI matter: either way the web server doesn't receive a value with the outside clutter in it. > The breakdown of the 202 is as follows. Thanks for providing this. > * Single trailing dot in domain part: 100 (prohibited by RFC but > plausibly deliverable) Yup. If it is deliverable then surely it's an alias to the same address without the trailing dot, in which case a browser could choose to remove it. > * Single trailing dot in local part: 40 (prohibited by RFC but > plausibly deliverable) Discussed previously. This seems to be the problematic category. > * Valid address in angle brackets (with other junk around it): 21 > (permitted by RFC, kind of, and plausibly deliverable) Discussed above. > * Multiple consecutive dots: 20 (prohibited by RFC but plausibly > deliverable) If you mean the ".."s are in the local part then yes, it sounds likely that would get delivered, and a quick non-exhaustive trial seemed to show this can work. (If they're in the hostname then I'd be amazed if it's deliverable, but surely it'd be to the same address that's reached by replacing sequences of dots to a single dot.) > * No @: 9 (unlikely to be deliverable) Indeed. > * Comment: 3 (permitted by RFC and plausibly deliverable) Equivalent to the angle bracket case above -- the address without the comment could be extracted. > * Miscellaneous: 9 (one containing [NO]@[SPAM], two with trailing >, > one in "quotes", one with single leading dot in local part, two with > single leading comma in local part, one with leading ": ", one with > leading "\") They don't sound deliverable, or if they are would also be with superfluous punctuation stripped. And I'm not sure single cases are worth fretting about. If HTML 5 validation rejected one of the above it seems very likely the user would be able to provide an alternative address (or alternatively punctuated address) which is valid. > > So it may actually be that there isn't a general problem here of > > lots of real-world e-mail addresses which work but don't comply with > > the RFCs; it may simply be the one case of ".@"? > > No, that was just the example I chose because I knew that person > personally, and so was able to confirm that the address actually > worked. There are two categories of input which could be a working e-mail address yet violate the RFCs: 1 A valid e-mail address with extra 'stuff' in it or surrounding it (spaces, comments, trailing punctuation characters, etc). As you suggested, browsers can clean up the user's input, so what servers receive is a valid e-mail address. 2 A working e-mail address which contains something the RFCs say it shouldn't but needs that in order to function; attempting to clean it up would transform it to a different e-mail address, which possibly delivers somewhere differently from the original. Analysis of your detailed breakdown suggests the only addresses in category 2 are those with dots in odd places in the local part. So it may be the only change required to allow all working real-world e-mail addresses is a willful violation that permits dots anywhere in the local part (even immediately after another . or before the @). That change would appear to cover the cases in your data, but others may have data which shows there are additional cases. Smylers
Received on Monday, 24 August 2009 07:42:53 UTC