W3C home > Mailing lists > Public > public-xhtml2@w3.org > May 2008

Re: Document Character Set, UNICODE, xhtml-access, and @key

From: Steven Pemberton <steven.pemberton@cwi.nl>
Date: Wed, 21 May 2008 16:10:08 +0200
To: "Shane McCarron" <shane@aptest.com>
Cc: "XHTML WG" <public-xhtml2@w3.org>
Message-ID: <op.ubica6wqsmjzpq@acer3010>


>> Correct me if I am wrong, but the document character set is always ISO  
>> 10646 in XML (and modern HTML). So the only mapping that needs to be  
>> done is the standard mapping from the character encoding as the  
>> document comes in to ISO 10646.
>>
>> Or do I misunderstand your question?
> No.   I think maybe I don't know what the document character set is. Is  
> it your belief that the document character set has nothing to do with  
> the encoding that is specified in the document header or media type?


That is absolutely my belief. All XML and HTML documents are in ISO 10646,  
but may be served in a different encoding, in which case the UA must  
transform them.

> For example, if I have an xml declaration that indicates an encoding of  
> Shift-JIS what does that mean for the DCS?  If the DCS is still ISO  
> 10646, and user agents are expected to transform the content from/to the  
> encoding, then that's great.   I think anyway.

Yes, that's how it works, and it is great.

Steven
Received on Wednesday, 21 May 2008 14:10:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 23 February 2010 18:12:48 GMT