Re: CES of the reult tree from James Clark on 1999-04-30 (xsl-editors@w3.org from April to June 1999)

From: James Clark <jjc@jclark.com>
Date: Fri, 30 Apr 1999 18:55:43 +0700
To: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>
CC: xsl-editors@w3.org
Message-ID: <37299A3F.813B70D@jclark.com>

MURATA Makoto wrote:
> 
> 1) CES of the result tree in files
> 
> When the result tree is created as a sequence of bytes, how does
> XSLT implementation chooses the character encoding scheme (CES)?

At the moment the implementation is free to use any CES that is capable
of representing the result tree correctly.

> In the spirit of XML, I think that any charset should be allowed
> although most implementations support UTF-8 or UTF-16 only.  One
> implementation of XSL in Japan already supports Shift JIS.  Though
> I am not a fan of Shift JIS and encourage people to switch to Unicode,
> I do not think we are not quite ready to throw away legacy encodings.
> 
> 2) Generating encoding declarations
> 
> We also need a mechanism for easily creating encoding declarations

The implementation is responsible for generating any encoding
declaration that is necessary for the generated XML file to be correctly
parsed (otherwise it will fail to meet the requirements of the third
paragraph of section 5).

> or META tags containing the charset info.

The result tree is XML, and the META tag isn't relevant for parsing the
XML.

>  It might be a good
> idea to automatically generate XML declarations or META tags.

> 3) CES of the result tree in the main memory
> 
> In the case that the result tree is created in the main
> memory and then directly used for imaging, the program should just know
> the CES of the result tree.

Up to a point.  The result tree internally might use multiple encodings
(eg iso-8859-1 for text nodes that contain only characters with codes <
0x100, ucs-2 for text nodes that contain only characters with codes <
0x10000, and ucs-4 otherwise).

James

Received on Friday, 30 April 1999 08:01:44 UTC