RE: XSLT 15th Nov: Text Output Method: unencoded characters

> -----Original Message-----
> From: David Carlisle [mailto:davidc@nag.co.uk] 
> Sent: 18 November 2002 12:13
> To: public-qt-comments@w3.org
> Subject: XSLT 15th Nov: Text Output Method: unencoded characters
> 
> 
> 
> 
> 
>    If the result tree contains a character that cannot be 
> represented in
>    the encoding that the processor is using for output, the 
> implementation
>    should signal a serialization error.
> 
> This is compatible with XSLT1 but it would be useful extra 
> functionality if there was an option available in xsl:output 
> method="text" to output unencoded characters. The format for 
> unencded characters isn't so important, and I'd be happy for 
> the format to be fixed in the specification, although 
> obviously one could imagine a more complex scheme that 
> allowed this to be specified.
> 
>  obvious candidates would be
> 
> &1234;
> \uabc
> U+1234
> possibly the latter is most "plain text like", being 
> Unicode's format for references to unicode characters in plain text.
> 
> In XSLT 1  I often find myself using the xml output method 
> (with ascii or latin 1 encoding) even when outputting text 
> files, just so that I get all characters output in a 
> consistent manner. (The exact format doesn't matter as I post 
> process the output with sed or perl to pick up all the non 
> ascii characters and encode them as needed (as TeX commands, 
> as often as not). It is tiresome in XSLT1 to detect all non 
> ascii characters and output them in some non standard format. 
> XSLT 2 regexp would make this a little easier but it would 
> still complicate the stylesheet greatly if every template 
> generating text in the result document had to run a template 
> to quote every non ascii character. It's much more convenient 
> to let the characters go to the result tree as characters and 
> deal with the quoting required for the text format as a 
> serialisation issue.
> 

Please see issues 15 and 124. We are considering a proposal to allow the
serialization of individual characters to be defined. This is seen as an
alternative to "sticky disable-output-escaping", on the basis that the only
known use cases for sticky doe are to include "non-standard" characters in
the output. The thinking is to allow users to include characters from the
private use area into text nodes, and then control how these are
subsequently serialized. I would envisage this applying to all output
methods.

Michael Kay

Received on Monday, 18 November 2002 15:14:22 UTC