- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 29 May 2002 16:51:11 +0900
- To: Chris Lilley <chris@w3.org>, www-tag@w3.org
In http://www.w3.org/2001/tag/ilist#charmodReview-17, Chris Lilley wrote: > 3.6.2 > "Because of the layered Web architecture (e.g. formats used over protocols), > there may be multiple and at times conflicting information about character > encoding. [S] Specifications MUST define conflict-resolution mechanisms > (e.g. priorities) for cases where there is multiple or conflicting > information about character encoding." > > Yes. Better though to not define such layering; the XML MIME RFC messed > this up by allowing the charset and the xml encoding declaration to differ > and for the former to take precedence; this requires "save as" to rewrite > the XML otherwise it is no longer well formed.... better to require any > transcoders to leave XML alone or to know how to rewrite the encoding > declaration if they change the encoding. Chris, me, and others have had quite a bit of discussions over the years about this, and I don't want to repeat all this discussion. But I want to point out that there is quite a bit of a difference here between XML and other formats. First, XML was carefully designed to have the encoding indication up front, in an easily parsable way. Second, XML is used very frequently. [Having done a little bit of work on the (X)HTML Validator (validator.w3.org), I wish the XML declaration had been defined by using #x20 (space) only, rather than S (http://www.w3.org/TR/REC-xml#NT-S), so that it's guaranteed to be on the first line.] The two reasons above make XML-specific transcoders that take care of changing the 'encoding' feasible. [I'm not sure they actually exist.] But the character model is written for all kinds of formats, not only XML. Or indeed much more for potential other formats, because XML is already done, and got most of the things right. Most other formats are neither as widespread as XML nor do they have the 'encoding' in such a neat place. And some (starting with text/plain) don't even have anything like an 'encoding' parameter. So if you have suggestions on how to improve 3.6.2, please try to make them so that they work with the generality of formats out there, including new formats where there would be a deployment problem for specific transcoders. Regards, Martin.
Received on Wednesday, 29 May 2002 03:57:18 UTC