Re: Comments on draft-duerst-mailto-bis-04.txt, please from Michael A. Puls II on 2008-01-31 (uri@w3.org from January 2008)

From: Michael A. Puls II <shadow2531@gmail.com>
Date: Wed, 30 Jan 2008 20:24:52 -0500
To: "Mike Brown" <mike@skew.org>
Cc: uri@w3.org
Message-ID: <6b9c91b20801301724m75dd83bckdb6b9dc994593f16@mail.gmail.com>

On 1/30/08, Mike Brown <mike@skew.org> wrote:
>
> > >A URL for this Internet-Draft is:
> > >http://www.ietf.org/internet-drafts/draft-duerst-mailto-bis-04.txt
>
> I thought I mentioned this before, but don't remember what the outcome was.
> AFAIK, browsers pretty universally treat %40 as equivalent to "@" in mailto
> URIs. Ofuscating email addresses in this way is even recommended as an
> antispam practice to fool automated address harvesters:
>
> http://www.neilgunton.com/doc/spambot_trap (where I first read of it)
> http://www.csarven.ca/hiding-email-addresses
> http://www.rl-digital.com/2006/hide-email-address/
>
> and so on... Google turns up quite a few.
>
> Should the mailto I-D acknowledge this widespread equivalence in
> implementations, even if it's not in keeping with the principle that "@" has a
> reserved purpose and "%40" means something different?

When encoding an hname or hvalue, I encode it like ECMAScript's
encodeURIComponent(), which means you get for example
"mailto:?cc=email%40example.com" and not
"mailto:?cc=email@example.com". For decoding hnames and hvalues, I use
decodeURIComponent() rules, which means %40 or @ in an hvalue would
come out as @.

In short, if you should encode the @ in
"http://example.com/mailto.php?cc=email%40example.com", you should do
the same with "mailto:?cc=email%40example.com" (even if mailto URIs
don't support username:password@ stuff). Hnames and hvalues in mailto
URIs are just encoded query string components anyway. Once the mail
client parses the URI and decodes the values, it can then do what it
needs (generate compose field values and or headers etc.). Same thing
goes for javascript URIs,  javascript:alert(%22%40%22)%3B for example.

Browsers *may* show %40 as @ to the user in their status bar for a
mailto link, but when it's passed to a mail client, the %40 will be
left in tact (unless the browser decides to unencode the %40 before
passing since rfc2368 doesn't say @ is reserved).

So, I think @ in a mailto URI should be reserved and should be encoded
as %40. The client will get @ back after it decodes just like it does
with current reserved characters. Of course, clients should still
produce the same result if the author of the mailto URI doesn't encode
the @.

Now, Kmail is a client that doesn't accept mailto URIs. It accepts
separate values as params and those values must be decoded. So, in
this case, passing email%40example.com as the TO param for Kmail might
not come out right. However, that's a problem with kmail and if you
do:

kfmclient exec "the full, properly-encoded mailto URI"

instead, everything will be taken care of and kmail will get its decoded values.

Section 2.1 of the new draft shouldn't exclude @ from the chars that
need to be percent-encoded. At the least though, the mailto URIs in
examples section in the new draft should use %40 instead of @, IMO.
That should show that an author of a mailto URI should use %40, but an
consumer of the URI should handle %40 and @ equally.

I feel the same way about +. An author of a mailto URI should generate
+ as %2B, but a consumer should handle + and %2B the same.

As for using %40 just to hide @, I think some bots are smarter now and
it won't help much.

-- 
Michael

Received on Thursday, 31 January 2008 01:25:05 UTC