Updating the mailto URI scheme for better I18N

Dear URI experts,

I have just submitted the draft appended below to the Internet
Drafts Editor. Here's the abstract for those that don't want
to scroll:

    This document defines the format of Uniform Resource Identifiers
    (URI) for designating electronic mail addresses.  The syntax of
    'mailto' URIs from [RFC2368] is extended to be compatible with IRIs
    ([RFC3987]) for better internationalization.

Comments welcome!

Regards,    Martin.

P.S.: Just in case, this already works in at least one browser (Opera)

------------------------------------------------------------------------



Network Working Group                                          M. Duerst
Internet-Draft                                       W3C/Keio University
Obsoletes: 2368 (if approved)                                L. Masinter
Expires: August 18, 2005                      Adobe Systems Incorporated
                                                        February 14, 2005


                          The mailto URI scheme
                        draft-duerst-mailto-bis-00

Status of this Memo

    This document is an Internet-Draft and is subject to all provisions
    of Section 3 of RFC 3667.  By submitting this Internet-Draft, each
    author represents that any applicable patent or other IPR claims of
    which he or she is aware have been or will be disclosed, and any of
    which he or she become aware will be disclosed, in accordance with
    RFC 3668.

    Internet-Drafts are working documents of the Internet Engineering
    Task Force (IETF), its areas, and its working groups.  Note that
    other groups may also distribute working documents as
    Internet-Drafts.

    Internet-Drafts are draft documents valid for a maximum of six months
    and may be updated, replaced, or obsoleted by other documents at any
    time.  It is inappropriate to use Internet-Drafts as reference
    material or to cite them other than as "work in progress."

    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/ietf/1id-abstracts.txt.

    The list of Internet-Draft Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html.

    This Internet-Draft will expire on August 18, 2005.

Copyright Notice

    Copyright (C) The Internet Society (2005).

Abstract

    This document defines the format of Uniform Resource Identifiers
    (URI) for designating electronic mail addresses.  The syntax of
    'mailto' URIs from [RFC2368] is extended to be compatible with IRIs
    ([RFC3987]) for better internationalization.




Duerst & Masinter        Expires August 18, 2005                [Page 1]

Internet-Draft            The mailto URI scheme            February 2005


Table of Contents

    1.   Introduction . . . . . . . . . . . . . . . . . . . . . . . .   3
    2.   Syntax of a mailto URL . . . . . . . . . . . . . . . . . . .   3
    3.   Semantics and Operations . . . . . . . . . . . . . . . . . .   5
    4.   Unsafe Headers . . . . . . . . . . . . . . . . . . . . . . .   5
    5.   Encoding . . . . . . . . . . . . . . . . . . . . . . . . . .   6
    6.   Deployment of UTF-8-Based Percent-Encoding . . . . . . . . .   6
    7.   Examples . . . . . . . . . . . . . . . . . . . . . . . . . .   6
      7.1  Examples Conforming to RFC2368 . . . . . . . . . . . . . .   6
      7.2  Examples Using UTF-8-Based Percent-Encoding  . . . . . . .   8
    8.   Security Considerations  . . . . . . . . . . . . . . . . . .   9
    9.   IANA Considerations  . . . . . . . . . . . . . . . . . . . .  10
    10.  Changes from RFC 2368  . . . . . . . . . . . . . . . . . . .  11
    11.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . .  11
    12.  References . . . . . . . . . . . . . . . . . . . . . . . . .  11
      12.1   Normative References . . . . . . . . . . . . . . . . . .  11
      12.2   Informative References . . . . . . . . . . . . . . . . .  12
         Authors' Addresses . . . . . . . . . . . . . . . . . . . . .  12
         Intellectual Property and Copyright Statements . . . . . . .  13































Duerst & Masinter        Expires August 18, 2005                [Page 2]

Internet-Draft            The mailto URI scheme            February 2005


1.  Introduction

    The mailto URI scheme is used to designate the Internet mailing
    address of an individual or service.  In its simplest form, a mailto
    URI contains an Internet mail address.  For interaction with
    resources that requires message headers or message bodies to be
    specified, the mailto URI scheme also allows setting mail header
    fields and the message body.

    A previous version of the mailto URI scheme had severe limitations
    for non-ASCII characters.  This document extends this to also allow
    character data to be percent-encoded based on UTF-8, as already seen
    in some implementations, for more straightforward and consistent
    internationalization.

    Please send comments on this document to the mailing list uri@w3.org.

2.  Syntax of a mailto URL

    Following the syntax conventions of [STD66], and using the ABNF
    syntax defined in  [RFC2234], a "mailto" URI has the form:

       mailtoURI   = "mailto:" [ to ] [ headers ]
       to          = [ mailbox *("%2C" mailbox ) ]
       headers     = "?" header *( "&" header )
       header      = hname "=" hvalue
       hname       = *urlc
       hvalue      = *urlc

    "mailbox" is as specified in [RFC2822], i.e.  it is a mail address,
    possibly including "phrase" and "comment" components.  However, the
    following changes apply:

    1.  All characters that can appear in "mailbox" but are reserved or
        not allowed in URIs have to be percent-encoded.  Examples are
        parentheses, commas, and the percent sign ("%"), which commonly
        occur in the "mailbox" syntax.

    2.  Percent-encoding can be used to denote non-ASCII characters in
        the part of a "mailbox" that denotes a domain name, in order to
        denote an internationalized domain name.  The considerations for
        reg-name in [STD66] apply.  In particular, non-ASCII characters
        must first be encoded according to UTF-8 [STD63], and then each
        octet of the corresponding UTF-8 sequence must be percent-encoded
        to be represented as URI characters.  URI producing applications
        must not use percent-encoding in domain names unless it is used
        to represent a UTF-8 character sequence.  When the
        internationalized domain name is used to compose a message, the



Duerst & Masinter        Expires August 18, 2005                [Page 3]

Internet-Draft            The mailto URI scheme            February 2005


        name must be transformed to the IDNA encoding [RFC3490].  URI
        producers should provide these domain names in the IDNA encoding,
        rather than percent-encoded, if they wish to maximize
        interoperability with legacy mailto: URI interpreters.

    3.  Percent-encoding in the LHS of an email address is reserved for
        potential future internationalization.  Non-ASCII characters must
        first be encoded according to UTF-8 [STD63], and then each octet
        of the corresponding UTF-8 sequence must be percent-encoded to be
        represented as URI characters.  Any other percent-encoding of
        non-ASCII characters is prohibited.  When a LHS containing
        non-ASCII characters will be used to compose a message, the LHS
        must be transformed to conform to whatever encoding may be
        defined in a future specification for the internationalization of
        email addresses.

    "hname" and "hvalue" are encodings of an [RFC2822] header name and
    value, respectively.  As with "to", all URI reserved characters must
    be encoded.

    The special hname "body" indicates that the associated hvalue is the
    body of the message.  The "body" hname should contain the content for
    the first text/plain body part of the message.  The "body" hname  is
    primarily intended for generation of short text messages for
    automatic processing (such as "subscribe" messages for mailing
    lists), not general MIME bodies.

    Within mailto URIs, the characters "?", "=", "&" are reserved.

    Because the "&" (ampersand) character is reserved in HTML and XML,
    any mailto URI which contains an ampersand must be spelled
    differently in HTML and XML than in other contexts.  A mailto URI
    which appears in an HTML or XML document must escape the "&", e.g.
    as "&".

    Non-ASCII characters can be encoded in hvalue as follows:

    1.  MIME encoded words (as defined in [RFC2047]) are permitted in
        header values, but not in an hvalue of a "body" hname.

    2.  Non-ASCII characters can be encoded according to UTF-8 [STD63],
        and then each octet of the corresponding UTF-8 sequence is
        percent-encoded to be represented as URI characters.  When
        hvalues  encoded in this way are  used to compose a message, the
        hvalue must be transformed into MIME encoded words, except for an
        hvalue of a "body" hname, which has to be encoded according to
        [RFC2045].  Please note that for MIME encoded words and for
        bodies in composed email messages, encodings other than UTF-8 MAY



Duerst & Masinter        Expires August 18, 2005                [Page 4]

Internet-Draft            The mailto URI scheme            February 2005


        be used as long as the characters are properly transcoded.

    MIME encoded words and UTF-8-based percent-encoding SHOULD not both
    be used in the same hvalue.

    Also note that it is legal to specify both "to" and an "hname" whose
    value is "to".  That is,

    mailto:addr1%2C%20addr2

    is equivalent to

    mailto:?to=addr1%2C%20addr2

    is equivalent to

    mailto:addr1?to=addr2

3.  Semantics and Operations

    A mailto URI designates an "internet resource", which is the mailbox
    specified in the address.  When additional headers are supplied, the
    resource designated is the same address, but with an additional
    profile for accessing the resource.  While there are Internet
    resources that can only be accessed via electronic mail, the mailto
    URI is not intended as a way of retrieving such objects
    automatically.

    In current practice, resolving URIs such as those in the "http"
    scheme causes an immediate interaction between client software and a
    host running an interactive server.  The "mailto" URI has unusual
    semantics because resolving such a URI does not cause an immediate
    interaction.  Instead, the client creates a message to the designated
    address with the various header fields set as default.  The user can
    edit the message, send this message unedited, or choose not to send
    the message.  The operation of how any URI scheme is resolved is not
    mandated by the URI specifications.

4.  Unsafe Headers

    The user agent interpreting a mailto URI SHOULD choose not to create
    a message if any of the headers are considered dangerous; it may also
    choose to create a message with only a subset of the headers given in
    the URI.  Only the Subject, Keywords, and Body headers are believed
    to be both safe and useful.

    The creator of a mailto URI cannot expect the resolver of a URI to
    understand more than the "subject" and "body" headers.  Clients that



Duerst & Masinter        Expires August 18, 2005                [Page 5]

Internet-Draft            The mailto URI scheme            February 2005


    resolve mailto URIs into mail messages should be able to correctly
    create [RFC2822]-compliant mail messages using the "subject" and
    "body" headers.

5.  Encoding

    [STD66] requires that many characters in URIs be encoded.  This
    affects the mailto scheme for some common characters that might
    appear in addresses, headers or message contents.  One such character
    is space (" ", ASCII hex 20).  Note the examples below that use "%20"
    for space in the message body.  Also note that line breaks in the
    body of a message MUST be encoded with "%0D%0A".

    People creating mailto URIs must be careful to encode any reserved
    characters that are used in the URIs so that properly-written URI
    interpreters can read them.  Also, client software that reads URIs
    must be careful to decode strings before creating the mail message so
    that the mail messages appear in a form that the recipient will
    understand.  These strings should be decoded before showing the
    message to the user.

    The mailto URI scheme is limited in that it does not provide for
    substitution of variables.  Thus, a message body that must include a
    user's email address can not be encoded using the mailto URI.  This
    limitation also prevents mailto URIs that are signed with public keys
    and other such variable information.

6.  Deployment of UTF-8-Based Percent-Encoding

    UTF-8-based percent-encoding should only be used in actual mailto
    URIs once it is well deployed in software that interprets mailto URIs
    (such as mail user agents).

7.  Examples

7.1  Examples Conforming to RFC2368

    URIs for an ordinary individual mailing address:

    <mailto:chris@example.com>

    A URI for a mail response system that requires the name of the file
    in the subject:

    <mailto:infobot@example.com?subject=current-issue>

    A mail response system that requires a "send" request in the body:




Duerst & Masinter        Expires August 18, 2005                [Page 6]

Internet-Draft            The mailto URI scheme            February 2005


    <mailto:infobot@example.com?body=send%20current-issue>

    A similar URI could have two lines with different "send" requests (in
    this case, "send current-issue" and, on the next line, "send index".)

    <mailto:infobot@example.com?body=send%20current-issue%0D%0Asend%20index>

    An interesting use of mailto URIs is when browsing archives of
    messages.  Each browsed message might contain a mailto URI like:

    <mailto:foobar@example.com?In-Reply-To=%3C3469A91.D10AF4C@example.com%3E>


    A request to subscribe to a mailing list:

    <mailto:majordomo@example.com?body=subscribe%20bamboo-l>

    A URI for a single user which includes a CC of another user:

    <mailto:joe@example.com?cc=bob@example.com&body=hello>

    Another way of expressing the same thing:

    <mailto:?to=joe@example.com&cc=bob@example.com&body=hello>

    Note the use of the "&" reserved character, above.  The following
    example, by using "?" twice, is incorrect:

    <mailto:joe@example.com?cc=bob@example.com?body=hello>   ; WRONG!

    According to [RFC2822], the characters "?", "&", and even "%" may
    occur in addr-specs.  The fact that they are reserved characters in
    this URI scheme is not a problem: those characters may appear in
    mailto URIs, they just may not appear in unencoded form.  The
    standard URI encoding mechanisms ("%" followed by a two-digit hex
    number) must be used in these cases.

    To indicate the address "gorby%kremvax@example.com" one would do:

    <mailto:gorby%25kremvax@example.com>

    To indicate the address "unlikely?address@example.com", and include
    another header, one would do:

    <mailto:unlikely%3Faddress@example.com?blat=foop>

    As described above, the "&" (ampersand) character is reserved in HTML
    and must be replaced e.g.  with "&amp;".  Thus, a complex URI that



Duerst & Masinter        Expires August 18, 2005                [Page 7]

Internet-Draft            The mailto URI scheme            February 2005


    has internal ampersands might look like:

    Click <a
    href="mailto:?to=joe@xyz.com&amp;cc=bob@xyz.com&amp;body=hello">
    mailto:?to=joe@xyz.com&amp;cc=bob@xyz.com&amp;body=hello</a> to send
    a greeting message to Joe and Bob.

7.2  Examples Using UTF-8-Based Percent-Encoding

    Sending a mail with the subject "coffee" in French, i.e.  "cafe"
    where the final e is an e-acute, using UTF-8 and percent-encoding:

    mailto:user@example.org?subject=caf%C3%A9

    The same subject, this time using an encoded-word (escaping the "="
    and "?" characters used in the encoded-word syntax, because they are
    reserved):

    mailto:user@example.org?subject=%3D%3Futf-8%3FQ%3Fcaf%3DC3%3DA9%3F%3D

    The same subject, this time encoded as iso-8859-1:

    mailto:user@example.org?subject=%3D%3Fiso-8859-1%3FQ%3Fcaf%3DE9%3F%3D

    Going back to straight UTF-8 and adding a body with the same value:

    mailto:user@example.org?subject=caf%C3%A9&body=caf%C3%A9

    This mailto URI may result in a message looking like this:

       From: sender@example.net
       To: user@example.org
       Subject: =?utf-8?Q?caf=C3=A9?=
       Content-Type: text/plain;charset=utf-8
       Content-Transfer-Encoding: quoted-printable

       caf=C3=A9














Duerst & Masinter        Expires August 18, 2005                [Page 8]

Internet-Draft            The mailto URI scheme            February 2005


    The software sending the email is not restricted to UTF-8, but can
    use other encodings.  The following shows the same email using
    iso-8859-1 two times:

       From: sender@example.net
       To: user@example.org
       Subject: =?iso-8859-1?Q?caf=E9?=
       Content-Type: text/plain;charset=iso-8859-1
       Content-Transfer-Encoding: quoted-printable

       caf=E9

    Different content transfer encodings (i.e.  "8bit" or "base64"
    instead of "quoted-printable")  and different encodings in encoded
    words (i.e.  "B" instead of "Q") can also be used.

    For more examples of encoding the word coffee in different languages,
    see [RFC2324].

    The following example uses the Japanese word "natto" (U+7D0D U+8C46)
    as a domain name label, sending a mail to a user at
    "natto".example.org:

    mailto:user@%E7%B4%8D%E8%B1%86.example.org?subject=Test&body=NATTO

    When constructing the email, the domain name label is converted to
    punycode.  The resulting message may look as follows:

       From: sender@example.net
       To: user@xn--99zt52a.example.org
       Subject: Test
       Content-Type: text/plain
       Content-Transfer-Encoding: 7bit

       NATTO


8.  Security Considerations

    The mailto scheme can be used to send a message from one user to
    another, and thus can introduce many security concerns.  Mail
    messages can be logged at the originating site, the recipient site,
    and intermediary sites along the delivery path.  If the messages are
    not encoded, they can also be read at any of those sites.

    A mailto URI gives a template for a message that can be sent by mail
    client software.  The contents of that template may be opaque or
    difficult to read by the user at the time of specifying the URI.



Duerst & Masinter        Expires August 18, 2005                [Page 9]

Internet-Draft            The mailto URI scheme            February 2005


    Thus, a mail client should never send a message based on a mailto URI
    without first showing the user the full message that will be sent
    (including all headers that were specified by the mailto URI), fully
    decoded, and asking the user for approval to send the message as
    electronic mail.  The mail client should also make it clear that the
    user is about to send an electronic mail message, since the user may
    not be aware that this is the result of a mailto URI.

    A mail client should never send anything without complete disclosure
    to the user of what is will be sent; it should disclose not only the
    message destination, but also any headers.  Unrecognized headers, or
    headers with values inconsistent with those the mail client would
    normally send should be especially suspect.  MIME headers (MIME-
    Version, Content-*) are most likely inappropriate, as are those
    relating to routing (From, Bcc, Apparently-To, etc.)

    Note that some headers are inherently unsafe to include in a message
    generated from a URI.  For example, headers such as "From:", "Bcc:",
    and so on, should never be interpreted from a URI.  In general, the
    fewer headers interpreted from the URI, the less likely it is that a
    sending agent will create an unsafe message.

    Examples of problems with sending unapproved mail include:

       mail that breaks laws upon delivery, such as making illegal
       threats;

       mail that identifies the sender as someone interested in breaking
       laws;

       mail that identifies the sender to an unwanted third party;

       mail that causes a financial charge to be incurred on the sender;

       mail that causes an action on the recipient machine that causes
       damage that might be attributed to the sender.

    Programs that interpret mailto URIs should ensure that the SMTP
    "From" address is set and correct.

    The security considerations of [STD66], [RFC3490], and also apply.
    [RFC3987]

9.  IANA Considerations

    This document changes the definition of the mailto: URI scheme; the
    registry of URI schemes should refer to this document rather than its
    predecessor, [RFC2368].



Duerst & Masinter        Expires August 18, 2005               [Page 10]

Internet-Draft            The mailto URI scheme            February 2005


10.  Changes from RFC 2368

       For interoperability with IRIs ([RFC3987]), allowed
       percent-encoding, fixed to UTF-8, in the domain name part of an
       email address, in LHS part of an address (currently reserved
       because not operationally usable), and in hvalue parts.

       Changed from 'URL' to 'URI'

       Updated references: ABNF to [RFC2234]; message syntax to
       [RFC2822], URI Generic Syntax to [STD66]

       Expanded "#mailbox", because the "#" shortcut is no longer
       available; needs checking


11.  Acknowledgments

    This document was derived from [RFC2368]; the acknowledgments from
    this specification still applies.  In addition, we thank Paul Hoffman
    and Jamie Zawinsky for their work on [RFC2368].

    Valuable input on this document was received from: Paul Hoffman.

12.  References

12.1  Normative References

    [RFC2045]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
               Extensions (MIME) Part One: Format of Internet Message
               Bodies", November 1996.

    [RFC2047]  Moore, K., "MIME Part Three: Message Header Extensions for
               Non-ASCII Text", RFC 2047, November 1996.

    [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
               Requirement Levels", BCP 14, RFC 2119, March 1997.

    [RFC2234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
               Specifications: ABNF", RFC 2234, November 1997.

    [RFC2822]  Resnik, P., "Internet Message Format", RFC 2822, April
               2001.

    [RFC3490]  Faltstrom, P., Hoffman, P. and A. Costello,
               "Internationalizing Domain Names in Applications (IDNA)",
               RFC 3490, March 2003.




Duerst & Masinter        Expires August 18, 2005               [Page 11]

Internet-Draft            The mailto URI scheme            February 2005


    [RFC3491]  Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
               Profile for Internationalized Domain Names (IDN)",
               RFC 3491, March 2003.

    [RFC3987]  Duerst, M. and M. Suignard, "Internationalized Resource
               Identifiers (IRIs)", RFC 3987, January 2005.

    [STD63]    Yergeau, F., "UTF-8, a transformation format of ISO
               10646", STD 63, RFC 3629, November 2003.

    [STD66]    Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform
               Resource Identifier (URI): Generic Syntax", STD 66,
               RFC 3986, April 2004.

12.2  Informative References

    [RFC2324]  Masinter, L., "Hyper Text Coffee Pot Control Protocol
               (HTCPCP/1.0)", RFC 2324, April 1998.

    [RFC2368]  Hoffman, P., Masinter, L. and J. Zawinski, "The mailto URL
               scheme", RFC 2368, July 1998.


Authors' Addresses

    Martin Duerst (Note: Please write "Duerst" with u-umlaut wherever
                   possible, for example as "D&#252;rst" in XML and HTML.)
    World Wide Web Consortium/Keio University
    5322 Endo
    Fujisawa, Kanagawa  252-8520
    Japan

    Phone: +81 466 49 1170
    Fax:   +81 466 49 1171
    Email: mailto:duerst@w3.org
    URI:   http://www.w3.org/People/D%C3%BCrst/


    Larry Masinter
    Adobe Systems Incorporated
    345 Park Ave
    San Jose, CA  95110
    USA

    Phone: +1-408-536-3024
    Email: LMM@acm.org
    URI:   http://larry.masinter.net/




Duerst & Masinter        Expires August 18, 2005               [Page 12]

Internet-Draft            The mailto URI scheme            February 2005


Intellectual Property Statement

    The IETF takes no position regarding the validity or scope of any
    Intellectual Property Rights or other rights that might be claimed to
    pertain to the implementation or use of the technology described in
    this document or the extent to which any license under such rights
    might or might not be available; nor does it represent that it has
    made any independent effort to identify any such rights.  Information
    on the procedures with respect to rights in RFC documents can be
    found in BCP 78 and BCP 79.

    Copies of IPR disclosures made to the IETF Secretariat and any
    assurances of licenses to be made available, or the result of an
    attempt made to obtain a general license or permission for the use of
    such proprietary rights by implementers or users of this
    specification can be obtained from the IETF on-line IPR repository at
    http://www.ietf.org/ipr.

    The IETF invites any interested party to bring to its attention any
    copyrights, patents or patent applications, or other proprietary
    rights that may cover technology that may be required to implement
    this standard.  Please address the information to the IETF at
    ietf-ipr@ietf.org.


Disclaimer of Validity

    This document and the information contained herein are provided on an
    "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
    OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
    ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
    INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
    INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
    WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Copyright Statement

    Copyright (C) The Internet Society (2005).  This document is subject
    to the rights, licenses and restrictions contained in BCP 78, and
    except as set forth therein, the authors retain all their rights.


Acknowledgment

    Funding for the RFC Editor function is currently provided by the
    Internet Society.




Duerst & Masinter        Expires August 18, 2005               [Page 13]
 

Received on Monday, 14 February 2005 11:59:00 UTC