Re: email address in a URI from Etan Wexler on 2005-07-12 (uri@w3.org from July 2005)

From: Etan Wexler <ewexler@stickdog.com>
Date: Mon, 11 Jul 2005 23:34:40 -0400
To: URI Interest Group <uri@w3.org>, Tim Kindberg <timothy@hpl.hp.com>, sandro hawke <sandro@w3.org>
Message-ID: <42D33A50.4010907@stickdog.com>
Frank Ellermann wrote to the URI-Interest-Group list <mailto:uri@w3.org> 
on 10 July 2005 in “Re: email address in a URI” 
(<mid:42D13B00.7667@xyzzy.claranet.de>, 
<http://www.w3.org/mid/42D13B00.7667@xyzzy.claranet.de>):

> A quoted-pair for [NO-WS-CTL] in a quoted-string is utter dubious,
> the worst practical case is the quoted-pair "\" SP.

Frank, are you implying that the <NO-WS-CTL> characters are obsolete in 
e-mail addresses? Should RFC 2822 get a revision? Does either answer 
affect what route the “tag” scheme should take?

> [You have] to be very sure that nobody encodes or decodes
> the tags more than once.

What is the experience of the participants in the URI Interest Group? 
Will software authors screw this up? Even with a detailed description of 
the algorithm? If so, is it proper that the “tag” scheme flatly ban the 
use of e-mail addresses with “percent” signs?

If Tim Kindberg and Sandro Hawke approve, we could deploy an open-source 
tag-minting service on the “tag”-scheme Web site, preferably on the 
front page (<http://taguri.org/>). The availability of the service, one 
hopes, will prevent end users from minting malformed tags. The 
availability of correct source code, one hopes, will prevent software 
authors from creating software that mints many malformed tags. Tim and 
Sandro, how do you feel about hosting a minting service?

> NO-WS-CTL is utter dubious, no matter what the standards say,
> without "security considerations" I'd stay away from this crap.

Should the “tag” scheme ban the use of e-mail addresses with control 
characters? I fail to see a real security problem with the mere 
representation of control characters in “tag” URIs. A lousy programmer 
could make a security problem out of the situation, but a lousy 
programmer can make a security problem out of any situation.

>>                     "!"   / "%22" / "%23" / "$"   /
>>                     "%25" / "&"   / "%27" / "("   /
> 
> 
> Maybe it's also elegant, but it's not obvious [...].
> 
> Works, but it's no straight forward scheme.  The alternative [...]
> is much longer and a pain, but needs no special explanation.

Then, to me, the question is about the probability of software authors 
screwing it up and about the scale of the screw-up.

> [The “tag” scheme] does not define a path or query[.]

The “tag” scheme has no need to define a path or query. RFC 3986, 
section 3.3, “Path”: “A path is always defined for a URI”. RFC 3986 
identifies the query as the portion of a URI between the first question 
mark and either the number sign or the end of the URI.

> I'm not sure about "/", "?", "=" -
> do you propose to reserve [those characters] because it would be too confusing,
> or is this actually necessary ?

I implied reservations and liberties in the grammar that I proposed. 
That grammar, which was only for the <emailAddress> portion of a tag and 
which left the rest of the tag grammar untouched, did not allow literal 
slashes or question marks in the representation of the local part of an 
e-mail address. The grammar did allow literal “equals” signs in the 
representation of the local part of an e-mail address.

Consult the specification of the “tag” scheme for the syntax and 
semantics of the <specific> portion of a tag, bearing in mind what RFC 
3986 mandates for all URIs.

> Be careful with ALPHA, local parts are case sensitive.

The use of the <ALPHA> symbol from RFC 2234 is irrelevant to the case 
sensitivity of e-mail addresses. The <ALPHA> definition includes all 
uppercase and lowercase letters in the ASCII repertoire. And there is 
nothing that makes case-insensitive those constructs whose definitions 
include <ALPHA>.

-- 
Etan Wexler.
Received on Tuesday, 12 July 2005 03:31:51 UTC