Re: I-D ACTION:draft-duerst-mailto-bis-00.txt from Bruce Lilly on 2005-04-08 (uri@w3.org from April 2005)

From: Bruce Lilly <blilly@erols.com>
Date: Fri, 8 Apr 2005 14:55:19 -0400
To: uri@w3.org
Message-Id: <200504081455.19877.blilly@erols.com>

On Wed February 16 2005 10:20, Internet-Drafts@ietf.org wrote:
> A New Internet-Draft is available from the on-line Internet-Drafts directories.
>
>
> Title : The mailto URI scheme
> Author(s) : M. Duerst, L. Masinter
> Filename : draft-duerst-mailto-bis-00.txt
> Pages : 13
> Date : 2005-2-15

Comments:

The Abstract states "for designating electronic mail addresses", the
section 1 text states "the Internet mailing address of an individual or
service", section 3 says "an internet resource", and the reality seems
to be specification of a prototype internet message (RFCs 822, 2822) as
alluded to briefly in draft section 8. Claims regarding the purpose of
a mailto URI should be consistent.

Section 1 claims that "a previous version of the mailto URI scheme had
severe limitations for non-ASCII characters", which is untrue; RFC 2047
mechanisms which (as amended by errata and RFC 2231) provide not only
for non-ASCII text but also for language tagging as required by RFC
2277 for text.

The UTF-8 scheme presented is claimed as "more straightforward and
consistent internationalization", but it is not backwards compatible
with existing implementations and fails to provide any mechanism for
language tagging as required by BCP 18. When foisted upon existing
mailto URI parsers, illegal message content will be generated, causing
loss of interoperability due to the lack of backwards compatibility of
that provision in the draft under discussion.

Section 2 ABNF uses "urlc", which is not defined anywhere. Note that
per http://www.ietf.org/ID-Checklist.html, all ABNF is supposed to be
checked for such errors. The text implies that "mailbox" and "address"
per RFC 2822 are equivalent, whereas they are defined quite differently
in that RFC; moreover, the field body of an RFC 2822 To field is an
address-list, which is not mentioned in the draft under discussion.
Text states that "reserved" characters must be encoded, but does not
give a list of "reserved" characters or a reference. RFC 3986 (listed
as a normative reference, but not specifically mentioned w.r.t.
"reserved") defines URI reserved characters as:
reserved = gen-delims / sub-delims

gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"

sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
The draft text specifically mentions "parentheses, comma, and the
percent sign" as common in mailbox syntax; parentheses and comma are
forbidden in a mailbox (they are RFC 822/2822 "specials"), percent is
not "reserved" (but has other issues in URIs) and is rather uncommon in
mailboxes, and the required '@' character which appears in every
address-list is not mentioned and is not encoded in the examples. And
square brackets ('[' and ']') are explicitly used in doamin literals
which may be used in the domain of a mailbox. Colon appears in the RFC
822/2822 syntax of addresses which are named groups, and appear in the
route portion of RFC 822 route-addrs. Forward slashes appear in X.400
derived mailboxes, and '!' can appear in local-parts (RFC 976).
Finally, '<' and '>' are specials explicitly used in RFC 822/2822
angle-addrs (which may appear in mailboxes and addresses); while these
are not "reserved", they may not appear (unencoded) in URIs. I believe
that the '@' "reserved" character issue w.r.t. encodong has recently
been discussed at length w.r.t. RFC 2368. Percent-encoding is
recommended for non-ASCII octets, but that is incompatible with
existing mailto URI-to-message prototype implementations, and will
result in illegal and incompatible content in the resulting message
prototypes. There is some wishy-washy wording about "wish to
maximize interoperability"; the simple fact is that the proposed
change is not backwards compatible, full stop. The topic is carried
to ridiculous extremes by requiring developers to implement something
which is nowhere defined (paragraph labeled "3." (especially see the
last sentence in that paragraph).

Non-standard terminology which is inconsistent with standard
terminology as defined and used in normative references (esp. RFC 2822)
appears in the draft (except, curiously, in the second paragraph of
draft section 3, which does use standard terminology). E.g. instead of
"header name", the standard term is "header field name" or "field name"
(RFC 2822 section 2.2).

The draft uses "body" in the same syntax as would be used for a
header field name, but lacks any indication of how a generator or
parser is supposed to differentiate message body from a header field
named "Body", nor is there a message header field name registration
template (BCP 90) reserving the header field name "Body". Message
header field names are comprised of printing characters excluding
colon, and can therefore include characters such as '?', '=', and '&'.
The draft does not specifically discuss how those or "reserved"
characters are to be handled when they appear within a header field
name (as opposed to parts of a mailto URI intended to be part of a
field body or message body).

The draft seems to have a number of formatting/content anomalies:

idnits reports:

* The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement .
* There are 52 instances of too long lines in the document, the longest one being 5 characters in excess of 72.
[...]
- Line 140 has weird spacing: '..." hname is...'
- Line 161 has weird spacing: '...hvalues encod...'

There are also 3 empty lines following the formfeed after the last
page (nothing is supposed to follow that formfeed character).

The examples at the end of section 2 do not meet syntax requirements;
in particular the address-lists do not meet RFC 2822 syntax
requirements as specified at the beginning of draft section 2 ("addr1",
for example, is not a valid RFC 2822 address (or mailbox)).

Specification revisions, such as those proposed in the draft under
discussion, should ideally be designed in a backwards compatible
fashion. When that is not possible, a "flag day" for universal change
form the "old" to "new" format may be specified. Flag days are highly
undesirable due to the disruption caused. The draft does something
much worse; it requires a non-specific, poorly defined flag day: "once
it is well deployed in software" (draft section 6). No mechanism is
defined for determining precisely when that flag day is to take place.

The examples in section 7.1 are bracketed with less-than and
greater-than symbols, unlike the examples in earlier draft sections.
The examples fail to percent-encode "reserved" characters as required
by earlier provisions in the draft. Section 7.2 compounds inconsistency
by returning to unbracketed examples. The first example in 7.2 will
result in illegal content with existing, deployed mailto URI handlers.
The second and third examples fail to percent-encode "reserved"
characters. The fourth example will also result in illegal content
with existing, deployed, mailto URI handlers; moreover, the draft
implies that header fields which are NOT specified in the mailto URI
are magically generated (Content-Type and Content-Transfer-Encoding
fields are presented as having resulted from the example, but are
nowhere specified in that example). It is unclear how the supposed
determination of media type was made; for all I know, the content
might have been intended by the mailto URI generator as describing
a message body with media type image/png. The Subject field in the
message prototype shows a charset specified, but the mailto URI
specifies no such charset, and there is no indication of language.
It is unclear how the Content-Transfer-Encoding field was created
out of thin air, nor why quoted-printable (vs. base64) encoding
was specified. The remaining examples in the section have similar
issues.

Draft section 8 contains the incomprehensible text "of what is will be
sent". Section 8 also states that "MIME header[ field]s" are
inappropriate, despite the fact that earlier examples use them
(apparently generated from thin air). That text also mentions
"Apparently-To", but there is no such message header field (RFC 4021).

The same section mentions "SMTP 'Form' address", but it is unclear what
that is supposed to mean (perhaps the SMTP envelope return path as
specified as the SMTP MAIL FROM command argument, which is used for
delivery notifications?).

The last sentence of that section says "[RFC3490], and also apply".
And what?

The IANA Considerations section has no mention of registration of
a message header field name "Body" (see above).

There is no indication in the draft announcement, the draft heading, or
in the draft Abstract of the intended status sought for this draft. The
substantial changes proposed in the draft as currently written (viz.
UTF-8 not encoded per RFCs 2047/2231 and errata) would preclude
advancement to Draft status if they remain, but Draft status might be
feasible w/o those incompatible changes (of course draft status would
require a separate enumeration of at least two interoperable and
independent implementations which fully conform with all provisions
of the specification).

Some issues reported regarding RFC 2368 remain unaddressed by the draft
under discussion:

The syntax permits some constructs corresponding to peculiar messages,
e.g. a completely empty specification (save for "mailto:"), message
body without any header fields. While it may be difficult or
impractical to prevent some of that via ABNF, the normative text
should probably warn against naive implementations that might
generate invalid messages.

Within mailto URLs, the characters "?", "=", "&" are reserved.

As with URL reserved characters, there does not appear to be any
technical requirement to reserve all three of those characters in
all parts of a mailto URL. For example, neither "=" nor "&" should
cause trouble in the "to" part of a mailto URL. Likewise "?" should
be safe in "header".

Received on Friday, 8 April 2005 18:55:29 UTC