- From: Steven Pemberton <steven.pemberton@cwi.nl>
- Date: Thu, 29 Jun 2017 13:23:26 +0200
- To: public-xformsusers@w3.org, "XForms Users Community Group Issue Tracker" <sysbot+tracker@w3.org>, "Steven Pemberton" <steven.pemberton@cwi.nl>
But wait. There's more! Something we have long wanted to support. https://tools.ietf.org/html/rfc6531 "SMTP Extension for Internationalized Email" adds international email addresses. Still only in draft form. https://tools.ietf.org/html/rfc6531#section-3.3 "The key changes made by this specification include: o The <Mailbox> ABNF rule is imported from RFC 5321 and updated in order to support the internationalized email address. Other related rules are imported from RFC 5321, RFC 5234, RFC 5890, and RFC 6532, or are extended in this document. o The definition of <sub-domain> is extended to permit both the RFC 5321 definition and a UTF-8 string in a DNS label that conforms with IDNA definitions [RFC5890]. o The definition of <atext> is extended to permit both the RFC 5321 definition and a UTF-8 string. That string MUST NOT contain any of the ASCII graphics or control characters." An erratum changes that to: "The definition of <atext> is extended to permit both the RFC 5321 definition and a UTF-8 string. That string MUST NOT contain any of the Extended ASCII graphics (%d128-255) or control characters." But anyway, they define it formally: https://tools.ietf.org/html/rfc6532#section-3.1 UTF8-non-ascii = UTF8-2 / UTF8-3 / UTF8-4 https://tools.ietf.org/html/rfc3629#section-4 UTF8-2 = %xC2-DF UTF8-tail UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) / %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail ) UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) / %xF4 %x80-8F 2( UTF8-tail ) UTF8-tail = %x80-BF atext =/ UTF8-non-ascii sub-domain =/ U-label A U-label is hairy: https://tools.ietf.org/html/rfc5890#section-2.3.2.1 "A "U-label" is an IDNA-valid string of Unicode characters, in Normalization Form C (NFC) and including at least one non-ASCII character, expressed in a standard Unicode Encoding Form (such as UTF-8). It is also subject to the constraints about permitted characters that are specified in Section 4.2 of the Protocol document and the rules in the Sections 2 and 3 of the Tables document, the Bidi constraints in that document if it contains any character from scripts that are written right to left, and the symmetry constraint described immediately below." https://tools.ietf.org/html/rfc5891 puts the constraints on which characters are permitted in a u-label, but does that by pointing to https://tools.ietf.org/html/rfc5892 which is rather horrid because it is a long list of allowed and disallowed characters. But I think there is something possible to work with, which I will work on a bit longer. Steven On Wed, 28 Jun 2017 18:10:54 +0200, Steven Pemberton <steven.pemberton@cwi.nl> wrote: > OK, Erik's email led me back to RFC 5321: > https://tools.ietf.org/html/rfc5321 > > Somewhere deep in that document, you find the definition for mailbox: > > Mailbox = Local-part "@" ( Domain / address-literal ) > > Address literals are for IP addresses. I propose we drop those. > > Domain = sub-domain *("." sub-domain) > > I propose we require at least one "." > > sub-domain = Let-dig [Ldh-str] > Let-dig = ALPHA / DIGIT > Ldh-str = *( ALPHA / DIGIT / "-" ) Let-dig > > So a sub-domain must start and end with a letter or digit, and may > contain hyphens. > > Local-part = Dot-string / Quoted-string > > I propose we drop quoted-string. > > Dot-string = Atom *("." Atom) > Atom = 1*atext > > So a local part consists of one or more atoms separated by ".". > An atom is a string of 1 or more atexts. > > You have to go to RFC 5322 to find the definition of atext: > https://tools.ietf.org/html/rfc5322 > > atext = ALPHA / DIGIT / ; Printable US-ASCII > "!" / "#" / ; characters not including > "$" / "%" / ; specials. Used for atoms. > "&" / "'" / > "*" / "+" / > "-" / "/" / > "=" / "?" / > "^" / "_" / > "`" / "{" / > "|" / "}" / > "~" > I propose we keep those. > > So in summary: > > email: atom ("." atom)* "@" sub ("." sub)+ > > sub: letdig (ldh* letdig)? > letdig: a-Z A-Z 0-9 > ldh: letdig | "-" > atom: atext+ > > Steven > > On Wed, 28 Jun 2017 15:15:47 +0200, XForms Users Community Group Issue > Tracker <sysbot+tracker@w3.org> wrote: > >> ACTION-2130: Summarise the apache email validation >> >> https://www.w3.org/2005/06/tracker/xforms/actions/2130 >> >> Assigned to: Steven Pemberton >> >> >> >> >> >> >>
Received on Thursday, 29 June 2017 11:24:03 UTC