W3C home > Mailing lists > Public > public-qt-comments@w3.org > January 2007

RE: Media Type registration for XSLT 2, XQuery 1.0 and XQueryX 1.0

From: Larry Masinter <LMM@acm.org>
Date: Fri, 19 Jan 2007 13:51:25 -0800
To: "'Liam R. E. Quin'" <liam@w3.org>
Cc: <public-qt-comments@w3.org>
Message-ID: <000001c73c13$f750c7a0$08f0070a@adobenet.global.adobe.com>

About charset:

This was a really big issue with XML and charset

RFC 3023 went into this at great length -- for
at least 10 pages -- in order to clarify all of the
cases. And in the end, they came to the conclusion
that they needed an (optional) charset parameter
to handle all of the cases.

The language in the your document would seem to
allow EBCDIC or UTF32 or any of a number of other
encodings, so that you wouldn't be able to tell
what the encoding declaration within the data
stream actually said!

In the end, you are better off either restricting
charsets or else following the application/xml
conventions exactly, rather than generating
some slightly different set of rules.

> if it's in UTF-16 it
> has to start with a byte order mark and we can tell

RFC 2781 (which defines 'utf-16') says

   Any labelling application that uses UTF-16 character encoding, and
   puts an explicit charset label on the text, and does not know the
   serialization order of the characters in text, MUST label the text as
   "UTF-16", and SHOULD make sure the text starts with 0xFEFF.

so the BOM isn't required.

Received on Friday, 19 January 2007 21:51:50 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:45:32 UTC