Re: I-D ACTION:draft-duerst-mailto-bis-00.txt from Martin Duerst on 2006-03-06 (uri@w3.org from March 2006)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Mon, 06 Mar 2006 22:04:33 +0900
To: uri <uri@w3.org>
Cc: Jamie Zawinski <jwz@jwz.org>, "Larry Masinter" <masinter@adobe.com>
Message-Id: <6.0.0.20.2.20060306211856.038e4bc0@localhost>
Hello Bruce,

At 03:55 05/04/09, Bruce Lilly wrote:
 >
 >On Wed February 16 2005 10:20, Internet-Drafts@ietf.org wrote:
 >> A New Internet-Draft is available from the on-line Internet-Drafts 
directories.
 >>
 >>
 >> 	Title		: The mailto URI scheme
 >> 	Author(s)	: M. Duerst, L. Masinter
 >> 	Filename	: draft-duerst-mailto-bis-00.txt
 >> 	Pages		: 13
 >> 	Date		: 2005-2-15
 >
 >Comments:
 >
 >The Abstract states "for designating electronic mail addresses", the
 >section 1 text states "the Internet mailing address of an individual or
 >service", section 3 says "an internet resource", and the reality seems
 >to be specification of a prototype internet message (RFCs 822, 2822) as
 >alluded to briefly in draft section 8.

And also discussed in detail in Section 3.

 >Claims regarding the purpose of a mailto URI should be consistent.

While I understand your complaint (I haven't written the text),
I seem to remember that this kind of language was carefully worked
out by Larry and others.

The fact that the Abstract, the Intro, and the actual text don't
say exactly the same is at least partially due to the fact that
the former two are summaries. I'd be wary to change something
here, but if Larry has a better proposal, I'll integrate it.


 >Section 1 claims that "a previous version of the mailto URI scheme had
 >severe limitations for non-ASCII characters",

That language has been down-tuned a bit.

 >which is untrue; RFC 2047
 >mechanisms which (as amended by errata and RFC 2231) provide not only
 >for non-ASCII text but also for language tagging as required by RFC
 >2277 for text.

Well, judging from implementation and use of RFC 2231, in the case
of message header fields, the corresponding provision of RFC 2277
seem to be going slighly too far.

 >The UTF-8 scheme presented is claimed as "more straightforward and
 >consistent internationalization", but it is not backwards compatible
 >with existing implementations and fails to provide any mechanism for
 >language tagging as required by BCP 18.  When foisted upon existing
 >mailto URI parsers, illegal message content will be generated, causing
 >loss of interoperability due to the lack of backwards compatibility of
 >that provision in the draft under discussion.

This is clearly addressed in section 6.

 >Section 2 ABNF uses "urlc", which is not defined anywhere.

Fixed.

 >Note that
 >per http://www.ietf.org/ID-Checklist.html, all ABNF is supposed to be
 >checked for such errors.  The text implies that "mailbox" and "address"
 >per RFC 2822 are equivalent, whereas they are defined quite differently
 >in that RFC; moreover, the field body of an RFC 2822 To field is an
 >address-list, which is not mentioned in the draft under discussion.

Yes, this was folded out to make the need for the %-encoding
of the comma explicit.

In any case, ABNF is an extremely poor means to describe the issue
at hand, where we are describing several levels of encoding and
overlapping syntaxes (of considerable complexity).

 >Text states that "reserved" characters must be encoded, but does not
 >give a list of "reserved" characters or a reference.  RFC 3986 (listed
 >as a normative reference, but not specifically mentioned w.r.t.
 >"reserved") defines URI reserved characters as:
 >      reserved    = gen-delims / sub-delims
 >
 >      gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
 >
 >      sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
 >                  / "*" / "+" / "," / ";" / "="
 >The draft text specifically mentions "parentheses, comma, and the
 >percent sign" as common in mailbox syntax; parentheses and comma are
 >forbidden in a mailbox (they are RFC 822/2822 "specials"), percent is
 >not "reserved" (but has other issues in URIs) and is rather uncommon in
 >mailboxes, and the required '@' character which appears in every
 >address-list is not mentioned and is not encoded in the examples.  And
 >square brackets ('[' and ']') are explicitly used in doamin literals
 >which may be used in the domain of a mailbox.  Colon appears in the RFC
 >822/2822 syntax of addresses which are named groups, and appear in the
 >route portion of RFC 822 route-addrs.  Forward slashes appear in X.400
 >derived mailboxes, and '!' can appear in local-parts (RFC 976).
 >Finally, '<' and '>' are specials explicitly used in RFC 822/2822
 >angle-addrs (which may appear in mailboxes and addresses); while these
 >are not "reserved", they may not appear (unencoded) in URIs.  I believe
 >that the '@' "reserved" character issue w.r.t. encodong has recently
 >been discussed at length w.r.t. RFC 2368.

This has been reworded. It should now be consistent.

 >Percent-encoding is
 >recommended for non-ASCII octets, but that is incompatible with
 >existing mailto URI-to-message prototype implementations,

It works already at least in some implementations.

 >and will
 >result in illegal and incompatible content in the resulting message
 >prototypes.

This is discussed in Section 6.

 >the simple fact is that the proposed
 >change is not backwards compatible, full stop.  The topic is carried
 >to ridiculous extremes by requiring developers to implement something
 >which is nowhere defined (paragraph labeled "3." (especially see the
 >last sentence in that paragraph).

Well, that work is now underway (BOF stage).

 >Non-standard terminology which is inconsistent with standard
 >terminology as defined and used in normative references (esp. RFC 2822)
 >appears in the draft (except, curiously, in the second paragraph of
 >draft section 3, which does use standard terminology).  E.g. instead of
 >"header name", the standard term is "header field name" or "field name"
 >(RFC 2822 section 2.2).

One instance fixed, no others found. If you find other problems,
please tell me.

 >The draft uses "body" in the same syntax as would be used for a
 >header field name, but lacks any indication of how a generator or
 >parser is supposed to differentiate message body from a header field
 >named "Body", nor is there a message header field name registration
 >template (BCP 90) reserving the header field name "Body".

Ah, so you agree with my suggestion in the previous mail.
Added as a todo this time around.

 >Message
 >header field names are comprised of printing characters excluding
 >colon, and can therefore include characters such as '?', '=', and '&'.
 >The draft does not specifically discuss how those or "reserved"
 >characters are to be handled when they appear within a header field
 >name (as opposed to parts of a mailto URI intended to be part of a
 >field body or message body).

There is now a requirement for escaping for both hname and hvalue
parts.

 >The draft seems to have a number of formatting/content anomalies:
 >
 >idnits reports:
 >
 >  * The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure
 >Acknowledgement .
 >  * There are 52 instances of too long lines in the document, the longest
 >one being 5 characters in excess of 72.

That should be fixed.

 >[...]
 >  - Line 140 has weird spacing: '..." hname  is...'
 >  - Line 161 has weird spacing: '...hvalues  encod...'
 >
 >There are also 3 empty lines following the formfeed after the last
 >page (nothing is supposed to follow that formfeed character).
 >
 >The examples at the end of section 2 do not meet syntax requirements;
 >in particular the address-lists do not meet RFC 2822 syntax
 >requirements as specified at the beginning of draft section 2 ("addr1",
 >for example, is not a valid RFC 2822 address (or mailbox)).

fixed.

 >Specification revisions, such as those proposed in the draft under
 >discussion, should ideally be designed in a backwards compatible
 >fashion.  When that is not possible, a "flag day" for universal change
 >form the "old" to "new" format may be specified.  Flag days are highly
 >undesirable due to the disruption caused.  The draft does something
 >much worse; it requires a non-specific, poorly defined flag day: "once
 >it is well deployed in software" (draft section 6).  No mechanism is
 >defined for determining precisely when that flag day is to take place.

This is not a flag day. There have been numerous improvements to
the Web and Internet infrastructure, and many of them made based
on stepwise upgrading based on judgement about a specific user base.
E.g. Web sites change from overly crappy HTML to use CSS based
on indications that e.g. 99% of their audience's browser understand
CSS.

 >The examples in section 7.1 are bracketed with less-than and
 >greater-than symbols, unlike the examples in earlier draft sections.
 >The examples fail to percent-encode "reserved" characters as required
 >by earlier provisions in the draft. Section 7.2 compounds inconsistency
 >by returning to unbracketed examples. The first example in 7.2 will
 >result in illegal content with existing, deployed mailto URI handlers.
 >The second and third examples fail to percent-encode "reserved"
 >characters.

Encoding issues fixed (earlier in the spec). Bracketing issues
fixed (by trying to bracket everything.

 >The fourth example will also result in illegal content
 >with existing, deployed, mailto URI handlers; moreover, the draft
 >implies that header fields which are NOT specified in the mailto URI
 >are magically generated (Content-Type and Content-Transfer-Encoding
 >fields are presented as having resulted from the example, but are
 >nowhere specified in that example).  It is unclear how the supposed
 >determination of media type was made; for all I know, the content
 >might have been intended by the mailto URI generator as describing
 >a message body with media type image/png.

It says earlier "The "body" hname should contain the content for
the first text/plain body part of the message."

 >The Subject field in the
 >message prototype shows a charset specified, but the mailto URI
 >specifies no such charset,

Yes, but one is obviously needed to correctly encode the
characters in question.

 >and there is no indication of language.

Same as for the absolutely overwhelming majority of emails sent
around the world every day, so no big surprise.

 >It is unclear how the Content-Transfer-Encoding field was created
 >out of thin air, nor why quoted-printable (vs. base64) encoding
 >was specified.  The remaining examples in the section have similar
 >issues.

Well, for QP, that was just one possible choice, and the one that
was easier to create and verify. If you send me the example in base64,
I can add it. I would have loved to use a straightforward 8bit example,
but there is a strict ASCII-only limitation for Internet drafts and
RFCs.

 >Draft section 8 contains the incomprehensible text "of what is will be
 >sent".

Removed the "is".

 >Section 8 also states that "MIME header[ field]s" are
 >inappropriate, despite the fact that earlier examples use them
 >(apparently generated from thin air).  That text also mentions
 >"Apparently-To", but there is no such message header field (RFC 4021).

Added "except when added by the MUA correctly encode the text(s) being sent",
to make sure it is clear that this applies to headers supplied directly
in the URI.

 >The same section mentions "SMTP 'Form' address", but it is unclear what
 >that is supposed to mean (perhaps the SMTP envelope return path as
 >specified as the SMTP MAIL FROM command argument, which is used for
 >delivery notifications?).

I would guess so. Larry?

 >The last sentence of that section says "[RFC3490], and also apply".
 >And what?

[RFC3987]. fixed.

 >The IANA Considerations section has no mention of registration of
 >a message header field name "Body" (see above).

Added as a TODO.

 >There is no indication in the draft announcement, the draft heading, or
 >in the draft Abstract of the intended status sought for this draft. The
 >substantial changes proposed in the draft as currently written (viz.
 >UTF-8 not encoded per RFCs 2047/2231 and errata) would preclude
 >advancement to Draft status if they remain,

Understood.

 >but Draft status might be
 >feasible w/o those incompatible changes (of course draft status would
 >require a separate enumeration of at least two interoperable and
 >independent implementations which fully conform with all provisions
 >of the specification).

Once we go to Proposed, my guess is that going to Draft won't
be that much of a problem. There is already at least one
implementation that does the new things in the draft, and
I'm expecting more.

 >Some issues reported regarding RFC 2368 remain unaddressed by the draft
 >under discussion:
 >
 >The syntax permits some constructs corresponding to peculiar messages,
 >e.g. a completely empty specification (save for "mailto:"), message
 >body without any header fields.  While it may be difficult or
 >impractical to prevent some of that via ABNF, the normative text
 >should probably warn against naive implementations that might
 >generate invalid messages.

As said in my previous message, added a sentence about that to
the security section. If you have a better place for this,
please tell me.

 >   Within mailto URLs, the characters "?", "=", "&" are reserved.
 >
 >As with URL reserved characters, there does not appear to be any
 >technical requirement to reserve all three of those characters in
 >all parts of a mailto URL. For example, neither "=" nor "&" should
 >cause trouble in the "to" part of a mailto URL. Likewise "?" should
 >be safe in "header".

See above.



Regards,    Martin.
Received on Monday, 6 March 2006 13:12:52 UTC