W3C home > Mailing lists > Public > www-tag@w3.org > September 2003

Re: Requesting a revision of RFC3023

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Fri, 19 Sep 2003 03:50:11 +0200
To: Tim Bray <tbray@textuality.com>
Cc: ietf-xml-mime@imc.org, WWW-Tag <www-tag@w3.org>
Message-ID: <3f744d82.1687358032@smtp.bjoern.hoehrmann.de>

* Tim Bray wrote:
>Agreed, which is another of the advantages of XML, since it doesn't need 
>a charset parameter.  You are right about the shortcomings of the 
>charset parameter but for the moment it's the best tool we have.

Depends on the format. Formats should provide means to specify
the encoding, if they do not they are BAD, broken as designed.

>>>I agree, but for XML formats, I still think the charset parameter is 
>>>actively harmful and should be deprecated or even forbidden.
>> Deprecating something useful just because it could cause trouble when
>> used improperly does not make sense to me.
>The argument is precisely is that it is not in the slightest useful.

Which makes me wonder why there is such a parameter. I think W3C should
have raised this concern during IESG review of RFC 2376. Complaining
about it know seems a bit late.

>Please read appendix F to the XML specification.  Then please suggest a 
>plausible scenario in which an XML instance unaccompanied by a charset 
>parameter can cause breakage.  You'll have to work hard.  Then suggest a 
>dozen ways in which deployed software is known to get the charset wrong. 
>You'll have no trouble.

I will neither have trouble to suggest ways in which deployed software
is known to get something wrong when the encoding declaration or the
byte order mark are involved, especially if those are used improperly.
But my logical conclusion is not to forbid the byte order mark or the
encoding declaration.

You want to change something that has been STRONGLY RECOMMENDED for over
five years to (ideally) MUST NOT just because it could cause trouble
when used improperly or with broken implementations. Today I am good
with web standards if I use the charset parameter, tommorow I am bad
with web standards if I do. What's next on #W3C? Use tables for layout
because people could get CSS wrong and old browsers get some CSS wrong?
I don't think this leads anywhere.

The charset parameter is useful if you cannot or do not want to use an
encoding declaration, for content negotiation, for view source
functionality, if you perform protocol operations that change the
encoding without changing the document or if you have to deal with
legacy applications that could break your document if no charset
parameter is present. I admit that there is probably no strong enough
use case to introduce it, but we have the parameter already and it has
been STRONGLY RECOMMENDED for ages across various W3C technologies.

I can live with removing the STRONGLY RECOMMENDED status and an
informative note that you typically do not need to specifiy the
charset parameter but anything beyond that goes much too far.

>To put it another way, quoting Larry Wall: "An XML document knows what 
>encoding it's in."


  Having to re-learn how to do something is costly, creating new
  programs to do the same thing in a different way is costly, and
  converting existing documents and other resources to a different
  format is also costly, so changes with little or no benefit should
  be avoided.
Received on Thursday, 18 September 2003 21:50:32 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:32:39 UTC