Re: RFC 2822 email addresses in tag URIs from Tim Kindberg on 2005-10-11 (uri@w3.org from October 2005)

From: Tim Kindberg <timothy@hpl.hp.com>
Date: Tue, 11 Oct 2005 14:32:26 +0100
To: Etan Wexler <ewexler@stickdog.com>
CC: URI Interest Group <uri@w3.org>, sandro hawke <sandro@w3.org>
Message-ID: <434BBEEA.1050300@hpl.hp.com>
Thanks Etan.

In a nutshell, there are two poitions:

The position I began with ("simplicity") allows in tag URIs a few 
special characters allowed by RFC 2822, but no %-encoding and hence no 
quotes or "quoted pairs", among other things.  This position implies 
that e.g. "Fred Smith's uncle!"@example.com can't be used to mint tags, 
whereas Fred_Smith's_uncle!@example.com can -- without transformation.

The position I tried following feedback ("inclusivity") allows 
%-encoding.  So "Fred Smith's uncle!"@example.com could be used for tags 
but it becomes something that 99% of humans wouldn't be able to 
formulate correctly unaided: tag:%22Fred%20...

Maybe my experience is unusual but I never encounter email addresses 
using the new freedoms allowed by RFC 2822 -- which has been around 
since 2001.  So it's hard to argue strongly for "inclusivity" -- which, 
in addition, (a) turned out to add significant complexity to what is 
otherwise simple syntax, and (b) is only "inclusive" if you can (operate 
a program to) correctly transform your email address.

On balance, I'm inclined to follow my original advice -- and now Etan's 
-- and go for "simplicity".

Sandro, what do you think of all this?

Cheers,

Tim.

Etan Wexler wrote:

> 
> Tim Kindberg wrote to the URI Interest Group’s mailing list 
> (<mailto:uri@w3.org>) on 23 September 2005 in “RFC 2822 email addresses 
> in tag URIs” (<mid:43340C98.8020404@hpl.hp.com>, 
> <http://www.w3.org/2002/02/mid/43340C98.8020404@hpl.hp.com>):
> 
>> I'm inclined to go for [the] simpler approach: take a subset of RFC 
>> 2822 email addresses that users could be expected to read & manipulate 
>> by hand and brain (following the 'tag' philosophy), and simply 
>> %-encode certain of their characters.
> 
> 
> Percent-encoding strikes me as a complication that moves “tag” URIs 
> outside of the target range of ease of use. Reconsider the employment of 
> percent-encoding or reconsider the target range of ease of use.
> 
>> Principle 1: only allow relatively simple, human-legible/tractable 
>> email address to be embedded in tags. So only allow printing 
>> characters (%20 - %7E). NB only whitespace character is " " (which has 
>> to be quoted in RFC2822-land).  No folding, no control characters.
>>
>> Principle 2: disallow obsolete constructs.
>> Principle 3: disallow comments -- no value in a tag but lots of 
>> potential for confusion.
> 
> 
> All the principles are agreeable.
> 
>> In addition, the following characters should not appear literally as 
>> part of an email address in a tag; they must be %-encoded (ONCE) from 
>> the original email address:
> 
> 
> In the “tag” specification, that word “should” will want to be the word 
> “must”.
> 
> The characters in the list must undergo percent-encoding only if said 
> characters appear in the <local-part> of the e-mail address.
> 
>>        gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
> 
> 
> Every e-mail address has a “commercial at” (U+0040) that separates the 
> <local-part> from the <domain>. Processors must transcribe this 
> “commercial at” literally when embedding an e-mail address in a “tag” URI.
> 
>> emailAddress      = tag-local-part "@" DNSname
>> tag-local-part    = tag-dot-atom-text / tag-no-fold-quote
>> tag-dot-atom-text = 1*tag-atext *("." 1*tag-atext)
>> tag-atext         = ALPHA / DIGIT  /
>>                       "!"   / "%23" /
>>                       "$"   / "%25" /
>>                       "&"   / "'"   /
>>                       "*"   / "+"   /
>>                       "-"   / "%2F" /
>>                       "="   / "%3F" /
>>                       "%5E" / "_"   /
>>                       "%60" / "%7B" /
>>                       "%7C" / "%7D" /
>>                       "~"
>> tag-no-fold-quote = "%22" *(tag-qtext / tag-quoted-pair) "%22"
>> tag-quoted-pair   = "%5C"  tag-qptext
>> tag-qtext         = tag-atext / "(" /
>>                       ")"   /  "%2C" /
>>                       "."   /  "%3A" /
>>                       ";"   /  "%3C" /
>>                       "%3E" /  "%40" /
>>                       "%5B" /  "%5D" /
>> tag-qptext        = tag-qtext / "%20" / "%5C" / "%22"
> 
> 
> I argue for something even simpler. I would eliminate percent-encoding 
> and quoted strings, leaving only e-mail addresses that transcribe 
> literally.
> 
> emailAddress      = tag-local-part "@" DNSname
> tag-local-part    = 1*tag-atext *("." 1*tag-atext)
> tag-atext         = ALPHA / DIGIT /
>                      "!"  /  "$"  /
>                      "&"  /  "'"  /
>                      "*"  /  "+"  /
>                      "-"  /  "="  /
>                      "_"  /  "~"
> 
> So I basically return to Tim’s proposal dated 6 July 2005 (“email 
> address in a URI”, <mid:42CBAAE0.3060309@hpl.hp.com>,
> <http://www.w3.org/mid/42CBAAE0.3060309@hpl.hp.com>). The specification 
> of the “tag” scheme will be stronger as a result of the discussion, I hope.
> 

-- 

Tim Kindberg
hewlett-packard laboratories
filton road
stoke gifford
bristol bs34 8qz
uk

purl.org/net/TimKindberg
timothy@hpl.hp.com
voice +44 (0)117 312 9920
fax +44 (0)117 312 8003
Received on Tuesday, 11 October 2005 13:37:44 UTC