XSLT 15th Nov: Text Output Method: unencoded characters

   If the result tree contains a character that cannot be represented in
   the encoding that the processor is using for output, the implementation
   should signal a serialization error.

This is compatible with XSLT1 but it would be useful extra functionality
if there was an option available in xsl:output method="text" to output
unencoded characters. The format for unencded characters isn't so
important, and I'd be happy for the format to be fixed in the
specification, although obviously one could imagine a more complex
scheme that allowed this to be specified.

 obvious candidates would be

&1234;
\uabc
U+1234
possibly the latter is most "plain text like", being Unicode's format
for references to unicode characters in plain text.

In XSLT 1  I often find myself using the xml output method (with ascii or
latin 1 encoding) even when outputting text files, just so that I get
all characters output in a consistent manner. (The exact format doesn't
matter as I post process the output with sed or perl to pick up all the
non ascii characters and encode them as needed (as TeX commands, as
often as not). It is tiresome in XSLT1 to detect all non ascii
characters and output them in some non standard format. XSLT 2 regexp
would make this a little easier but it would still complicate the
stylesheet greatly if every template generating text in the result
document had to run a template to quote every non ascii character. It's
much more convenient to let the characters go to the result tree as
characters and deal with the quoting required for the text format as a
serialisation issue.


David

_____________________________________________________________________
This message has been checked for all known viruses by Star Internet
delivered through the MessageLabs Virus Scanning Service. For further
information visit http://www.star.net.uk/stats.asp or alternatively call
Star Internet for details on the Virus Scanning Service.

Received on Monday, 18 November 2002 07:13:45 UTC