W3C home > Mailing lists > Public > www-tag@w3.org > May 2002

Re: Comments on charmod from Chris

From: Martin Duerst <duerst@w3.org>
Date: Wed, 29 May 2002 16:51:11 +0900
Message-Id: <4.2.0.58.J.20020529162133.02744e70@localhost>
To: Chris Lilley <chris@w3.org>, www-tag@w3.org

In http://www.w3.org/2001/tag/ilist#charmodReview-17, Chris Lilley wrote:

 > 3.6.2

 > "Because of the layered Web architecture (e.g. formats used over protocols),
 > there may be multiple and at times conflicting information about character
 > encoding. [S] Specifications MUST define conflict-resolution mechanisms
 > (e.g. priorities) for cases where there is multiple or conflicting
 > information about character encoding."
 >
 > Yes. Better though to not define such layering; the XML MIME RFC messed
 > this up by allowing the charset and the xml encoding declaration to differ
 > and for the former to take precedence; this requires "save as" to rewrite
 > the XML otherwise it is no longer well formed.... better to require any
 > transcoders to leave XML alone or to know how to rewrite the encoding
 > declaration if they change the encoding.

Chris, me, and others have had quite a bit of discussions over the
years about this, and I don't want to repeat all this discussion.

But I want to point out that there is quite a bit of a difference
here between XML and other formats. First, XML was carefully designed
to have the encoding indication up front, in an easily parsable way.
Second, XML is used very frequently.

[Having done a little bit of work on the (X)HTML Validator
(validator.w3.org), I wish the XML declaration had been defined
by using #x20 (space) only, rather than S (http://www.w3.org/TR/REC-xml#NT-S),
so that it's guaranteed to be on the first line.]

The two reasons above make XML-specific transcoders that take
care of changing the 'encoding' feasible. [I'm not sure they
actually exist.]

But the character model is written for all kinds of formats, not only
XML. Or indeed much more for potential other formats, because XML is
already done, and got most of the things right. Most other formats
are neither as widespread as XML nor do they have the 'encoding' in
such a neat place. And some (starting with text/plain) don't even
have anything like an 'encoding' parameter.

So if you have suggestions on how to improve 3.6.2, please try to
make them so that they work with the generality of formats out there,
including new formats where there would be a deployment problem for
specific transcoders.

Regards,    Martin.
Received on Wednesday, 29 May 2002 03:57:18 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:47:07 GMT