Re: Updating the mailto URI scheme for better I18N from Frank Ellermann on 2005-02-14 (uri@w3.org from February 2005)

From: Frank Ellermann <nobody@xyzzy.claranet.de>
Date: Mon, 14 Feb 2005 13:57:29 +0100
To: uri@w3.org
Message-ID: <4210A039.579E@xyzzy.claranet.de>
Martin Duerst wrote:

> The syntax of 'mailto' URIs from [RFC2368] is extended to
> be compatible with IRIs ([RFC3987]) for better
> internationalization.

Fascinating.  I expected that it would be near to impossible
to fix 2368 for some decades. ;-)  And I didn't know that the
IANA URI-registry is incomplete, they don't have RfC 2324:

| coffee-url  =  coffee-scheme ":" [ "//" host ]
|                ["/" pot-designator ] ["?" additions-list ]

> the mailto URI scheme also allows setting mail header fields
> and the message body.

I'd like to add some minor points about this in 2368bis, the
old 2368-remark...

| Only the Subject, Keywords, and Body headers are believed to
| be both safe and useful.

...does not exactly reflect common practice.  Cc: is clearer
than mailto:what@ever.example?to=an@other.example constructs.
Maybe add it below the "to" example for "hname":

>     mailto:addr1%2C%20addr2
>     is equivalent to
>     mailto:?to=addr1%2C%20addr2
>     is equivalent to
>     mailto:addr1?to=addr2

    The latter form is NOT RECOMMENDED.  If the desired effect
    is to specify a secondary recipient mailto:addr1?cc=addr2
    can be used.

Back to your corrections:

> A previous version of the mailto URI scheme had severe
> limitations for non-ASCII characters.

That's dubious.  All you really do is to add unencoded IDNs,
and IDNs didn't exist when 2368 was written.  It's no "severe
limitation", if <a href="mailto:martin@d%C3%BCrst.example">
might work with future browsers, when the equivalent form
<a href="mailto:martin@xn--drst-0ra.example"> works today.

Sure, the IRI version needs two bytes less than the punycode
version in this example.  OTOH it doesn't work with old user
agents.

[Found later in your draft:  Okay, the body= UTF-8 stuff is
 really new, and that could be seen as a "severe limitation"
 today.  I certainly don't miss it, my MUA does not support
 the body= feature at all.]

> more straightforward and consistent internationalization.

Yes, in theory.  But not yet in practice, if I publish a
mailto:-URL somewhere, then I want it to work for almost all
users today.  You explain this later in chapter 6.

>        hname       = *urlc
>        hvalue      = *urlc

RfC 2368 and your draft use "urlc" without proper syntax or
explanation, please add something like this:

         urlc = %d33-36 / %d38-60 / %d62 / %d64-126

RfC 3968 apparently says nothing about "<" and ">", is this as
you want it ?  Otherwise you get %d33-36 / %d38-59 / %d64-126.
Plain text examples:

mailto:no@body.example?subject=is%20<this>%20okay%3F
<mailto:no@body.example?subject=%3Cthat%3E%20is%20clear!>

BTW, please add a note about mailto-IRIs in documents, where
the document charset is not UTF-8.  If I got your draft right,
the idea is to use percent-encoded UTF-8 even if the document
charset is something else like Latin-1.  Example, in this
article I use Latin-1, but a <mailto:martin@d�rst.example> is
an invalid URL, and it's also no IRI.

A <mailto:body@check.example?body=d�rst> is also invalid here,
or isn't it ?  I'm not sure about these examples, there's no
obvious technical problem with this body=d�rst parameter of a
2368-mailto-URL in a Latin-1 article.

> URI producers should provide these domain names in the IDNA
> encoding, rather than percent-encoded, if they wish to
> maximize interoperability with legacy mailto: URI
> interpreters

Indeed, unfortuately you can't say SHOULD here.

> Percent-encoding in the LHS of an email address is reserved
> for potential future internationalization.  Non-ASCII
> characters must first be encoded according to UTF-8 [STD63]

The first statement is only correct for Non-ASCII, there's no
general problem with percent-encoding in the LHS of addresses
in mailto URLs.  The "quoted string" case of a LHS can be very
weird.

> Within mailto URIs, the characters "?", "=", "&" are reserved.

Maybe add a forward reference to chapter 5 here about NO-WS-CTL
and WSP.  I don't find a general rule about this issue in 3986,
probably I'm missing something obvious (?).

> 1.  MIME encoded words (as defined in [RFC2047]) are permitted
>     in header values, but not in an hvalue of a "body" hname.

That's clear.  You aren't planning to invent a mailto-IRI-body,
or are you ?  Oops, I found body=caf%3C%A9 later, now that's a
PITA, by using mailto-IRI-bodies the MUA is more or less forced
to generate a Content-Type: text/plain;charset=utf-8 with QP or
Base64.  If you really want this, please say so not only in an
example.  This has side effects on systems, where the default
local charset is _not_ Unicode (any of the UTFs).

> MIME encoded words and UTF-8-based percent-encoding SHOULD not
> both be used in the same hvalue.

Maybe you need a MUST NOT here, and definitely a NOT.  Examples:

mailto:an@example?subject=%3D%3Fus-ascii%3FQ%3F1%3F%3D_2%3F%3D
mailto:an@example?subject=%3D%3Fus-ascii%3FQ%3FD%C3%BCrst3F%3D

Whatever that is, it's no Subject: 1?= 2 or Subject: d�rst. (?)

> The creator of a mailto URI cannot expect the resolver of a URI
> to understand more than the "subject" and "body" headers.
[...]

Here's a place where you could explain, why clients should try
to support in-reply-to, and how "URI producers" should use it.

 [in the examples:]
> ?In-Reply-To=%3C3469A91.D10AF4C@example.com%3E>

Here's the place, where you could say that this should be the
Message-ID of the mail in question.  One popular software gets
this wrong and apparently uses the last Message-ID found in the
References or in In-Reply-To to construct its mailto-URL.  That
confuses the threading of mail replies based on the mailto-URL.

> Another way of expressing the same thing:
> <mailto:?to=joe@example.com&cc=bob@example.com&body=hello>

Please delete this example, it's ugly.  You already have this
variant in the paragraph about "to" as "hname".

> Click <a
> href="mailto:?to=joe@xyz.com&amp;cc=bob@xyz.com&amp;body=hello">
> mailto:?to=joe@xyz.com&amp;cc=bob@xyz.com&amp;body=hello</a> to
> send a greeting message to Joe and Bob.

I'd use an.example instead of xyz.com here, and replace the "to":

  Click <a
  href="mailto:joe@an.example?cc=bob@a.example&amp;body=hello">
  mailto:joe@an.example?cc=bob@an.example&amp;body=hello</a> to
  send a greeting message to Joe with a copy to Bob.

> mailto:user@example.org?subject=%3D%3Futf-8%3FQ%3Fcaf%3DC3%3DA9%3F%3D

Maybe replace user@example.org by an@example if your examples
are otherwise too long for RfC lines.

> The software sending the email is not restricted to UTF-8, but
> can use other encodings.

It's more or less forced to stick to UTF-8 or maybe another UTF.
Otherwise it would have to analyze the mailto-IRI-body assuming
UTF-8 input.  That's a major difference from traditional mailto-
URLs.

> The security considerations of [STD66], [RFC3490], and also
> apply. [RFC3987]

s/apply. [RFC3987]/[RFC3987] apply./

IMHO "also apply" is not good enough.  Either add some of the
worst examples like say illegal UTF-8 encodings and phishing, or
urge the readers to really check out these "external" sources.

Please add a note, that a plain text <URL:mailto:an@example>
MUST NOT use any percent encoded UTF-8, and is by definition
a "visible with any browser" URL, not an IRI.

                             Bye, Frank
Received on Monday, 14 February 2005 17:33:16 UTC