Re: [EAI] [Fwd: AD review of draft-duerst-mailto-bis-06.txt] from Martin J. Dürst on 2009-10-14 (public-iri@w3.org from October 2009)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Wed, 14 Oct 2009 13:37:31 +0900
To: Alexey Melnikov <alexey.melnikov@isode.com>
CC: "ima@ietf.org" <ima@ietf.org>, "public-iri@w3.org" <public-iri@w3.org>, Jamie Zawinsky <jwz@jwz.org>
Message-ID: <4AD5558B.9060203@it.aoyama.ac.jp>
Hello Alex,

On 2009/08/04 20:54, Alexey Melnikov wrote:
> I believe I completed the todo item assigned to me during the Stockholm
> meeting.

Many thanks for your review. Very helpful.

 > as it seems to be blocking some EAI drafts.

Which? EAI seems to move forward nicely.

 > In Section 2:
 >
 > addr-spec = local-part "@" domain
 > local-part = dot-atom / quoted-string
 >
 > I don't think this change goes all the way to clarify that obsolete RFC
 > 5322 syntax and comments are disallowed.
 > RFC 5322:
 > domain = dot-atom / domain-literal / obs-domain
 >
 > domain-literal = [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]
 >
 > dot-atom-text = 1*atext *("." 1*atext)
 >
 > dot-atom = [CFWS] dot-atom-text [CFWS]
 >
 > atom = [CFWS] 1*atext [CFWS]
 >
 > obs-domain = atom *("." atom)
 >
 > I think "obs-domain" and "domain-literal" definitions are problematic
 > (at least).

I changed 'domain' too simply be 'dot-atom'. I hope this works.
(not exactly my area of expertise)

 > Within 'mailto' URIs, the characters "?", "=", and "&" are reserved.
 >
 > "Reserved" in URI sense? If yes, I think this can be made clearer.

Changed the text to read:

Within 'mailto' URIs, the characters "?", "=", and "&" are reserved, 
serving as delimiters. They must be escaped (as "%3F", "%3D", and "%26", 
respectively) when not serving as delimiters.

But these are explained elsewhere in the spec, too, so that may now be 
too much, and may get reduced again on proofreading (after careful 
cross-checking).

 > 4. Percent-encoding can be used in the <domain> part of an <addr-
 > spec>, in order to denote an internationalized domain name. The
 > considerations for <reg-name> in [STD66] apply. In particular,
 > non-ASCII characters must
 >
 > s/must/MUST ?
 >
 > first be encoded according to UTF-8
 > [STD63], and then each octet of the corresponding UTF-8 sequence
 > must
 >
 > s/must/MUST ?
 >
 > be percent-encoded to be represented as URI characters. URI
 > producing applications must not
 >
 > s/must not/MUST NOT ?

Fixed all of those. Sometimes adjusted wording, sometimes upper-casing 
and sometimes using clearly non-normative wording.

 > use percent-encoding in domain
 > names unless it is used to represent a UTF-8 character sequence.
 > When the internationalized domain name is used to compose a
 > message, the name must be transformed to the IDNA encoding where
 > appropriate [RFC3490]. URI producers should provide these domain
 > names in the IDNA encoding, rather than percent-encoded, if they
 > wish to maximize interoperability with legacy 'mailto' URI
 > interpreters.
 >
 > As per IRI bar BOF in Stockholm: this needs to be aligned with any
 > [potential] changes to the IRI spec.

Yes. I personally don't think we need to change this (except for some 
more careful wording).


 > 5. Percent-encoding of non-ASCII octets in the <local-part> of an
 > <addr-spec> is reserved for the internationalization of the
 > <local-part>. Non-ASCII characters must
 >
 > s/must/MUST ?
 >
 > first be encoded
 > according to UTF-8 [STD63], and then each octet of the
 > corresponding UTF-8 sequence must
 >
 > s/must/MUST ?
 >
 > be percent-encoded to be
 > represented as URI characters. Any other percent-encoding of
 > non-ASCII characters is prohibited. When a <local-part>
 > containing non-ASCII characters will be used to compose a
 > message, the <local-part> must
 >
 > s/must/MUST ?
 >
 > be transformed to conform to
 > whatever encoding may be defined in a future specification for
 > the internationalization of email addresses.
 >
 > [...]
 >
 > Non-ASCII characters can be encoded in hfvalue as follows:
 > [...]
 >
 > 2. Non-ASCII characters can be encoded according to UTF-8 [STD63],
 > and then each octet of the corresponding UTF-8 sequence is
 > percent-encoded to be represented as URI characters. When header
 > field values encoded in this way are used to compose a message,
 > the <hfvalue> must
 >
 > s/must/MUST ?

Done (sometimes with wording changes).

 > be transformed into MIME encoded words
 > [RFC2047], except for an <hfvalue> of a "body" <hfname>, which
 > has to be encoded according to [RFC2045]. Please note that for
 > MIME encoded words and for bodies in composed email messages,
 > encodings other than UTF-8 MAY be used as long as the characters
 > are properly transcoded.
 >
 > [...]
 >
 > MIME encoded words and UTF-8-based percent-encoding SHOULD NOT both
 > be used sequentially in the same <hfvalue>, and MUST NOT be combined.
 >
 > Can you clarify what you are trying to say here?
 > In particular I am not clear on the meaning of "sequentially" here.

Ok. Sequentially means e.g. using MIME for the first word in the 
subject, and UTF-8-based percent-encoding for the second word.

As for the "MUST NOT be combined", that either makes MIME completely 
impossible ('?' and '=' used in MIME encoded words have to be reencoded, 
but that isn't allowed) or leaves that provision hanging in the air ('?' 
and '=' are US-ASCII, so UTF-8 is irrelevant when percent-encoding them) 
depending on the interpretation of 'UTF-8'. So that has to be fixed.

First I was thinking about replacing the paragraph with something like:
"In mailto: URIs, UTF-8-based percent-encoding is preferred to MIME 
encoded words because for the later, the '=' and '?' characters have to 
be percent-encoded."

But then that's also slightly inappropriate because MIME encoded words 
may work in some old implementations where UTF-8 doesn't. Then I went 
ahead and deleted that paragraph (because even 'sequential' mixing may 
be okay assuming implementations peel off one encoding layer after the 
other), and just inserted a short notice about the need to 
percent-encode '=' and '?' in point 1. a few lines above.


 > In Section 3:
 >
 > In current practice, resolving URIs such as those in the 'http' URI
 > scheme causes an immediate interaction between client software and a
 > host running an interactive server. The 'mailto' URI has unusual
 > semantics because resolving such a URI does not cause an immediate
 > interaction. Instead, the client creates a message to the designated
 > address with the various header fields set as default. The user can
 > edit the message, send this message unedited, or choose not to send
 > the message. The operation of how any URI scheme is resolved is not
 > mandated by the URI specifications.
 >
 > The last sentence doesn't seem to be related to the rest of the
 > paragraph. Should it be deleted or moved to a separate paragraph?

This sentence is giving the motivation for why the paragraph starts with 
"in current practice" and why there isn't a more normative definition 
along the lines of "to resolve a 'mailto' URI scheme, you MUST ...". So 
the position of this sentence seems okay to me. If you have any proposal 
for how to make this clearer, I'll be glad to use that.


 > In Section 4:
 >
 > The creator of a 'mailto' URI cannot expect the resolver of a URI to
 > understand more than the "subject" header field and "body".
 >
 > What about the "To" header field?

I don't know too much about actual implementations, but the fact that 
what corresponds to 'To' is usually given befor the '?' seems to suggest 
to me that universal support for 'To' is neither necessary nor therefore 
guaranteed.


 > Clients
 > that resolve 'mailto' URIs into mail messages MUST be able to
 > correctly create [RFC5322]-compliant mail messages using the
 > "subject" header field and "body".
 >
 > In Section 8:
 >
 > A 'mailto' URI gives a template for a message that can be sent by
 > mail client software. The contents of that template may be opaque or
 > difficult to read by the user at the time of specifying the URI.
 > Thus, a mail client should never send a message based on a 'mailto'
 >
 > s/should/SHOULD ?
 >
 > URI without first showing the full message that will be sent to the
 > user (including all header fields that were specified by the 'mailto'
 > URI), fully decoded, and asking the user for approval to send the
 > message as electronic mail. The mail client should also make it
 >
 > s/should/SHOULD
 >
 > clear that the user is about to send an electronic mail message,
 > since the user may not be aware that this is the result of a 'mailto'
 > URI.
 >
 > A mail client should never send anything without complete disclosure
 >
 > s/should/SHOULD
 >
 > to the user of what will be sent; it should disclose not only the
 >
 > s/should/SHOULD

Done.


 > message destination, but also any header fields. Unrecognized header
 > fields, or header fields with values inconsistent with those the mail
 > client would normally send should be especially suspect. MIME header
 > fields (MIME- Version, Content-*) are most likely inappropriate,
 > except when added by the MUA to correctly encode the text(s) being
 > sent, as are those relating to routing (From, Apparently-To, etc.)
 >
 >
 > 9. IANA Considerations
 >
 > This document changes the definition of the 'mailto' URI scheme; the
 > registry of URI schemes needs to be updated to refer to this document
 > rather than its predecessor, [RFC2368].
 >
 > It doesn't look like the proper URI registration template was ever
 > specified in this document or its predecessor.

Of course not in its predecessor, that was before we had any templates, 
I guess. Anyway, I added a template, please have a look at it when I 
post the draft.

Regards,    Martin.

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
Received on Wednesday, 14 October 2009 15:42:03 UTC