Re: Requesting a revision of RFC3023 from Tim Bray on 2003-09-17 (www-tag@w3.org from September 2003)

From: Tim Bray <tbray@textuality.com>
Date: Wed, 17 Sep 2003 10:38:47 -0700
To: MURATA Makoto <murata@hokkaido.email.ne.jp>
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Message-ID: <3F689C27.50407@textuality.com>

MURATA Makoto wrote:

> First, Simon and I were asked by the W3C team not to take any action
> on RFC 3023.  This is because the MIME type registration procedure was
> expected to change (see [1] and [2]).  So, Simon, Dan, and I can't do 
> anything right now.

Hmm, the TAG is pretty convinced that 3023 needs to change, so maybe Dan 
or Chris or TimBL could take this up internally.  I disagree that this 
should be frozen at the moment, since the TAG is quite likely to publish 
a document saying "RFC 3023 is wrong".

> As for the charset parameter, I am still uneasy to disallow or
> deprecate it.  But I agree to make "clear that nobody sending a
> media-type should send a charset for an XML media-type unless it
> REALLY REALLY KNOWS what it's sending," and to deprecate text/xml not
> because the charset parameter is harmful but because most XML is not
> text for casual users.

I think I provided a detailed explanation of why the charset is in fact 
actively harmful in the context of XML.  If you're not convinced it 
would be helpful if you could address those points.  If you already 
have, my apologies, perhaps you could give a pointer.

> I have repeatedly asked (e.g., [3]) what is the position of the TAG on
> charset detection for non-XML formats.  The latest version of the TAG
> finding document "Client handling of MIME headers" appears to
> recommend:

I read [3] and while I agree with much of it, it's obviously far too 
late to change the XML encoding declaration.  For the moment, I think 
that the architecturally-sound position is, for Web data formats, either 
(a) use XML, or (b) use the charset parameter.  I'm generally in favor 
of a general-purpose encoding-detection scheme such as you propose, but 
I'm pessimistic about getting it widely deployed for legacy formats.

> 	(1) non-self-describing data formats should rely on the
>             charset parameter, and
> 	(2) self-describing data formats should introduce their own
> 	    mechanism for specifying charsets.

I'll review the webarch doc, I suspect we haven't thought closely enough 
about this.

> As far as I know, the charset parameter is the only generic mechanism.  I 
> know the charset parameter is not working well, but I do not see any other 
> generic mechanisms.

I agree, but for XML formats, I still think the charset parameter is 
actively harmful and should be deprecated or even forbidden.  This is 
orthogonal to the larger question you (correctly) raise, of charset 
detection for non-XML formats.

-- 
Cheers, Tim Bray
         (ongoing fragmented essay: http://www.tbray.org/ongoing/)

Received on Wednesday, 17 September 2003 13:38:43 UTC