Re: ACTION-2130: Summarise the apache email validation

But wait. There's more! Something we have long wanted to support.

https://tools.ietf.org/html/rfc6531
"SMTP Extension for Internationalized Email"
adds international email addresses.

Still only in draft form.

https://tools.ietf.org/html/rfc6531#section-3.3

"The key changes made by this specification include:

    o  The <Mailbox> ABNF rule is imported from RFC 5321 and updated in
       order to support the internationalized email address.  Other
       related rules are imported from RFC 5321, RFC 5234, RFC 5890, and
       RFC 6532, or are extended in this document.

    o  The definition of <sub-domain> is extended to permit both the RFC
       5321 definition and a UTF-8 string in a DNS label that conforms
       with IDNA definitions [RFC5890].

    o  The definition of <atext> is extended to permit both the RFC 5321
       definition and a UTF-8 string.  That string MUST NOT contain any
       of the ASCII graphics or control characters."

An erratum changes that to:

"The definition of <atext> is extended to permit both the RFC 5321
definition and a UTF-8 string. That string MUST NOT contain any
of the Extended ASCII graphics (%d128-255) or control characters."

But anyway, they define it formally:

https://tools.ietf.org/html/rfc6532#section-3.1

    UTF8-non-ascii  =   UTF8-2 / UTF8-3 / UTF8-4

https://tools.ietf.org/html/rfc3629#section-4

    UTF8-2      = %xC2-DF UTF8-tail
    UTF8-3      = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
                  %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
    UTF8-4      = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) /
                  %xF4 %x80-8F 2( UTF8-tail )
    UTF8-tail   = %x80-BF

atext   =/  UTF8-non-ascii

sub-domain   =/  U-label

A U-label is hairy:
https://tools.ietf.org/html/rfc5890#section-2.3.2.1

"A "U-label" is an IDNA-valid string of Unicode characters, in
       Normalization Form C (NFC) and including at least one non-ASCII
       character, expressed in a standard Unicode Encoding Form (such as
       UTF-8).  It is also subject to the constraints about permitted
       characters that are specified in Section 4.2 of the Protocol
       document and the rules in the Sections 2 and 3 of the Tables
       document, the Bidi constraints in that document if it contains any
       character from scripts that are written right to left, and the
       symmetry constraint described immediately below."

https://tools.ietf.org/html/rfc5891 puts the constraints on which  
characters are permitted in a u-label, but does that by pointing to

https://tools.ietf.org/html/rfc5892

which is rather horrid because it is a long list of allowed and disallowed  
characters.

But I think there is something possible to work with, which I will work on  
a bit longer.

Steven





On Wed, 28 Jun 2017 18:10:54 +0200, Steven Pemberton  
<steven.pemberton@cwi.nl> wrote:

> OK, Erik's email led me back to RFC 5321:
> https://tools.ietf.org/html/rfc5321
>
> Somewhere deep in that document, you find the definition for mailbox:
>
>  Mailbox        = Local-part "@" ( Domain / address-literal )
>
> Address literals are for IP addresses. I propose we drop those.
>
>  Domain         = sub-domain *("." sub-domain)
>
> I propose we require at least one "."
>
>  sub-domain     = Let-dig [Ldh-str]
>  Let-dig        = ALPHA / DIGIT
>  Ldh-str        = *( ALPHA / DIGIT / "-" ) Let-dig
>
> So a sub-domain must start and end with a letter or digit, and may  
> contain hyphens.
>
>  Local-part     = Dot-string / Quoted-string
>
> I propose we drop quoted-string.
>
>  Dot-string     = Atom *("."  Atom)
>  Atom           = 1*atext
>
> So a local part consists of one or more atoms separated by ".".
> An atom is a string of 1 or more atexts.
>
> You have to go to RFC 5322 to find the definition of atext:
> https://tools.ietf.org/html/rfc5322
>
>  atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
>                         "!" / "#" /        ;  characters not including
>                         "$" / "%" /        ;  specials.  Used for atoms.
>                         "&" / "'" /
>                         "*" / "+" /
>                         "-" / "/" /
>                         "=" / "?" /
>                         "^" / "_" /
>                         "`" / "{" /
>                         "|" / "}" /
>                         "~"
> I propose we keep those.
>
> So in summary:
>
>    email: atom ("." atom)* "@" sub ("." sub)+
>
>    sub: letdig (ldh* letdig)?
>    letdig: a-Z A-Z 0-9
>    ldh: letdig | "-"
>    atom: atext+
>
> Steven
>
> On Wed, 28 Jun 2017 15:15:47 +0200, XForms Users Community Group Issue  
> Tracker <sysbot+tracker@w3.org> wrote:
>
>> ACTION-2130: Summarise the apache email validation
>>
>> https://www.w3.org/2005/06/tracker/xforms/actions/2130
>>
>> Assigned to: Steven Pemberton
>>
>>
>>
>>
>>
>>
>>

Received on Thursday, 29 June 2017 11:24:03 UTC