Re: allow UTF-16 not just UTF-8 (PR#6774) from BIGELOW,JIM (HP-Boise,ex1) on 2003-10-08 (www-html@w3.org from October 2003)

From: BIGELOW,JIM (HP-Boise,ex1) <jim.bigelow@hp.com>
Date: Wed, 8 Oct 2003 10:24:45 -0400
To: don@lexmark.com
Cc: elliott.bradshaw@zoran.com, www-html@w3.org
Message-ID: <020A3CF87FB5AC47AA67966B33845755063CA7B6@xboi22.boise.itc.hp.com>

From
http://hades.mn.aptest.com/cgi-bin/voyager-issues/XHTML-Print?id=6774;user=g
uest  - reply #3

Date: Wed Oct  1 12:43:54 2003

Don and Elliott,

The HTML working group discussed my question of why and XHTML-Print
processor
must be a conforming XML processor (in particular, why it must support both
UTF-8 and UTF-16 encodings) on October 1, 2003.  

The answer is that XHTML-Print must be a conforming XML processor and
support
both UTF-8 and UTF-16 encodings to preserve compatibility between xml-based
applications.

If XHTML-Print processors only supported UTF-8 then an xml-based application
could not be reliably depended upon to emit an XHTML-Print document that the
XHTML-print application could process.  For example, an xml-based Xforms
application's output of an XHTML-Print document cannot be restricted by the
XHTML-Print specification to UTF-8 since the application may not be able to
control the encoding.

Section 4.3.3 [1] and Appendix F [2] of the XML specification [3] give
heuristics for determing a document's encoding when the charset parameter of
the
MIME type [4] is absent.

An example UTF-16 decoder is available at [5] other encodings are at [6].

Jim Bigelow

[1] http://www.w3.org/TR/REC-xml#charencoding
[2] http://www.w3.org/TR/REC-xml#sec-guessing
[3] http://www.w3.org/TR/REC-xml
[4] http://www.ietf.org/rfc/rfc3023.txt
[5] http://interscript.sourceforge.net/interscript/doc/en_iscr_0282.html
[6] http://interscript.sourceforge.net/interscript/doc/en_iscr_0275.html

Jim
http://oz.boi.hp.com/~jhb/

Received on Wednesday, 8 October 2003 10:24:50 UTC