Re: RFC 2822 email addresses in tag URIs

Tim Kindberg wrote to the URI Interest Group’s mailing list 
(<mailto:uri@w3.org>) on 23 September 2005 in “RFC 2822 email addresses 
in tag URIs” (<mid:43340C98.8020404@hpl.hp.com>, 
<http://www.w3.org/2002/02/mid/43340C98.8020404@hpl.hp.com>):

> I'm inclined to go for [the] simpler approach: take a subset of 
> RFC 2822 email addresses that users could be expected to read & 
> manipulate by hand and brain (following the 'tag' philosophy), and 
> simply %-encode certain of their characters.

Percent-encoding strikes me as a complication that moves “tag” URIs 
outside of the target range of ease of use. Reconsider the employment of 
percent-encoding or reconsider the target range of ease of use.

> Principle 1: only allow relatively simple, human-legible/tractable email 
> address to be embedded in tags. So only allow printing characters (%20 - 
> %7E). NB only whitespace character is " " (which has to be quoted in 
> RFC2822-land).  No folding, no control characters.
>
> Principle 2: disallow obsolete constructs. 
>
> Principle 3: disallow comments -- no value in a tag but lots of 
> potential for confusion.

All the principles are agreeable.

> In addition, the following characters should not appear literally as 
> part of an email address in a tag; they must be %-encoded (ONCE) from 
> the original email address:

In the “tag” specification, that word “should” will want to be the word 
“must”.

The characters in the list must undergo percent-encoding only if said 
characters appear in the <local-part> of the e-mail address.

>        gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

Every e-mail address has a “commercial at” (U+0040) that separates the 
<local-part> from the <domain>. Processors must transcribe this 
“commercial at” literally when embedding an e-mail address in a “tag” URI.

> emailAddress      = tag-local-part "@" DNSname
> tag-local-part    = tag-dot-atom-text / tag-no-fold-quote
> tag-dot-atom-text = 1*tag-atext *("." 1*tag-atext)
> tag-atext         = ALPHA / DIGIT  /
>                       "!"   / "%23" /
>                       "$"   / "%25" /
>                       "&"   / "'"   /
>                       "*"   / "+"   /
>                       "-"   / "%2F" /
>                       "="   / "%3F" /
>                       "%5E" / "_"   /
>                       "%60" / "%7B" /
>                       "%7C" / "%7D" /
>                       "~"
> tag-no-fold-quote = "%22" *(tag-qtext / tag-quoted-pair) "%22"
> tag-quoted-pair   = "%5C"  tag-qptext
> tag-qtext         = tag-atext / "(" /
>                       ")"   /  "%2C" /
>                       "."   /  "%3A" /
>                       ";"   /  "%3C" /
>                       "%3E" /  "%40" /
>                       "%5B" /  "%5D" /
> tag-qptext        = tag-qtext / "%20" / "%5C" / "%22"

I argue for something even simpler. I would eliminate percent-encoding 
and quoted strings, leaving only e-mail addresses that transcribe literally.

emailAddress      = tag-local-part "@" DNSname
tag-local-part    = 1*tag-atext *("." 1*tag-atext)
tag-atext         = ALPHA / DIGIT /
                      "!"  /  "$"  /
                      "&"  /  "'"  /
                      "*"  /  "+"  /
                      "-"  /  "="  /
                      "_"  /  "~"

So I basically return to Tim’s proposal dated 6 July 2005 (“email 
address in a URI”, <mid:42CBAAE0.3060309@hpl.hp.com>,
<http://www.w3.org/mid/42CBAAE0.3060309@hpl.hp.com>). The specification 
of the “tag” scheme will be stronger as a result of the discussion, I hope.

-- 
Etan Wexler.

Received on Thursday, 6 October 2005 05:42:07 UTC