- From: Larry Masinter <LMM@acm.org>
- Date: Fri, 19 Jan 2007 13:51:25 -0800
- To: "'Liam R. E. Quin'" <liam@w3.org>
- Cc: <public-qt-comments@w3.org>
About charset: This was a really big issue with XML and charset declarations. RFC 3023 went into this at great length -- for at least 10 pages -- in order to clarify all of the cases. And in the end, they came to the conclusion that they needed an (optional) charset parameter to handle all of the cases. The language in the your document would seem to allow EBCDIC or UTF32 or any of a number of other encodings, so that you wouldn't be able to tell what the encoding declaration within the data stream actually said! In the end, you are better off either restricting charsets or else following the application/xml conventions exactly, rather than generating some slightly different set of rules. > if it's in UTF-16 it > has to start with a byte order mark and we can tell RFC 2781 (which defines 'utf-16') says Any labelling application that uses UTF-16 character encoding, and puts an explicit charset label on the text, and does not know the serialization order of the characters in text, MUST label the text as "UTF-16", and SHOULD make sure the text starts with 0xFEFF. so the BOM isn't required. Larry
Received on Friday, 19 January 2007 21:51:50 UTC