Re: "International" email addresses [I18N-ACTION-374]

Addison, I18N group,

Many thanks for the discussion so far, and for creating an issue for this  

To add to the discussion, I would like to point out the several dimensions  
to this issue which have been exposed:

1. Syntax, Static Semantics, Dynamic Semantics

To draw an analogy with programming languages, there are several  
properties of an identifier that can be validated:
Things that can be checked at compile time:
    1. Syntax: Can this thing be an identifier?
    2. Static semantics: Has it been declared? (etc)
Things that can be checked at run-time:
    1. Does it have a value? (etc)

With respect to validating email addresses, there are several comparable  
    1. Syntax: Could this string imaginably be a valid email address  
(regardless of specific details for instance for particular zones, or  
available TLDs).
    2. Static Semantics: Is this string an allowable email address, taking  
into account current rules for zones, which TLDs there are, etc.
    3. Dynamic semantics: Does the domain really exist? Does the email  
address really work?

There is another dimension too, that XML Schema distinguishes as "lexical  
space" and "value space"[1]:
    1. Lexical space: in this case, what the user thinks of, and types in,  
as a valid email address.
    2. Value space: in this case the email address as it might go over the  
wire, which may include puny-code processing.

It is noticeable that many answers across the internet to the vexing  
question of what is a valid international email address mix these things  
up in lots of interesting ways, without properly distinguishing them.

In this case, the XForms group is only interested in the Syntax of the  
Lexical Space. We are not interested, at the level of processing that we  
are now talking about, in whether it is a valid domain, if the zone parts  
follow the rules for that zone, or whether the email address really  
exists. The user may be typing in an address that represents a future  
address for a domain that doesn't yet exist, or for a TLD that doesn't yet  

As a result, I still believe that my original message was more or less  
right on this point: a syntactically correct email address is defined by  
rfc5322 as modified by rfc6532:

    address: atom-list "@" atom-list.
    atom-list: atom ( "." atom )*
    atom: C+
    C: any character in the world EXCEPT (),.:;<>@[\]

with the added exclusion of control characters in the list for C.


Best wishes,

Steven Pemberton
For the Forms WG

On Thu, 20 Nov 2014 17:37:23 +0100, Phillips, Addison <>  

> Dear Steven and XForms,
> Firstly, the WG *very much* welcomes further discussion from any and all  
> on this list: this is how we find stuff out. (Thanks to Anne, JcK,  
> Jungshik, and Shawn >for contributions so far)
> This is just a note to let you know that the Internationalization WG has  
> taken up a discussion of this topic, which has, obviously, some  
> interesting issues associated >with it. We’re aware that, although “EAI”  
> (email address internationalization) has been slow to mature and gain  
> traction, there are serious efforts from vendors and >in various  
> countries to bring non-ASCII mail addresses into the mainstream.
> This doesn’t play well with the current description in HTML (cited by  
> Anne) or various other places. As Shawn and John note, a regex  
> description of IDNA is >probably impossible. At best, such a regex would  
> be an approximation.
> The Internationalization WG is creating a discussion page to capture the  
> issues [1]. We have not had a chance to discuss the issue in greater  
> depth yet, but the >WG’s consensus is that this is an interesting  
> problem needing further investigation and documentation. Please note  
> that, owing to the Thanksgiving holiday in the >USA, the  
> Internationalization WG is unlikely to make much more of a response for  
> a couple of weeks.
> Regards (for I18N),
> Addison
> [1]
> From: Shawn Steele []Sent: Wednesday,  
> November 19, 2014 11:37 AM
> To: Jungshik SHIN (신정식)
> Cc: Anne van Kesteren; Steven Pemberton;; Forms  
> WG
> Subject: RE: "International" email addresses
> Validating the IDN part is much more complicated than validating the  
> local part, because you need to know the IDN rules.  Which means it  
> probably isn’t just a >“simple” regex.  
> So maybe the rule should allow Unicode in the domain part and encourage  
> complete IDN validation as an additional step?
> -Shawn
> From: [] On Behalf Of  
> Jungshik SHIN (???)
> Sent: Wednesday, November 19, 2014 10:53 AM
> To: Shawn Steele
> Cc: Anne van Kesteren; Steven Pemberton;; Forms  
> WG
> Subject: Re: "International" email addresses
> deals with it (EAI  
> support in email form validation) although the summary is a bit  
> misleading (it only >talks about IDN).
> Jungshik
> On Wed, Nov 19, 2014 at 10:07 AM, Shawn Steele  
> <> wrote:
>> Updating that to support EAI would be good.
>> -----Original Message-----
>> From: [] On  
>> Behalf Of Anne van Kesteren
>> Sent: Wednesday, November 19, 2014 2:07 AM
>> To: Steven Pemberton
>> Cc:; Forms WG
>> Subject: Re: "International" email addresses
>> On Wed, Nov 19, 2014 at 11:00 AM, Steven Pemberton  
>> <> wrote:
>>> So as far as I can see, an internationalised email address is:
>>>  address: atom-list "@" atom-list.
>>>  atom-list: atom ( "." atom )*
>>>  atom: C+
>>>  C: any character in the world EXCEPT (),.:;<>@[\]
>>> a) Do you agree?
>>> b) It was really hard to find this out. The internet is rife with
>>> people asking and getting bad answers. Please help the internet by
>>> being definitive.
>> I recommend matching HTML's definition:
>> --

Received on Wednesday, 26 November 2014 21:51:08 UTC