- From: Michael Kay <mhk@mhk.me.uk>
- Date: Mon, 23 Aug 2004 15:37:09 +0100
- To: "'David Carlisle'" <davidc@nag.co.uk>, <public-qt-comments@w3.org>
unparsed-text() is not actually an F&O function, it is an XSLT-only
function.
I think there's some merit in this suggestion, though as we get closer to
the finishing line the WG is starting to apply a fairly draconian approach
to the way it handles suggested improvements.
Michael Kay
> -----Original Message-----
> From: public-qt-comments-request@w3.org
> [mailto:public-qt-comments-request@w3.org] On Behalf Of David Carlisle
> Sent: 23 August 2004 15:17
> To: public-qt-comments@w3.org
> Subject: [F&O] line ends in unparsed-text()
>
>
>
> unparsed-text() can not be used to input arbitary byte streams as the
> resulting string needs to conform to the character restrictions in the
> data model, and the input is subject to character encoding which may
> change the bytes anyway. So it is principally useful for "text files"
> (as its name suggests). However one of the main
> distinguishing features
> of text files is that their line endings are platform-dependent and
> unparsed-text() (unlike an XML parser, and so the doc() function) does
> not take account of this.
>
> This means that given a file test.txt
>
> one
> two
> three
>
> on Windows (to take a specific example)
>
> <x>
> <xsl:value-of select="unparsed-text('test.txt','UTF-8')"/>
> </x>
>
> will produce
>
> <x>
> one@#xD;
> two@#xD;
> three@#xD;
> </x>
>
> (I am using @ rather than & here in case the character
> references get
> lost in some mail reading programs or translation to html in the
> archives)
>
> One can avoid this by going, for example
>
> translate(unparsed-text('test.txt','US-ASCII'),'@#xD;','')
>
> to get rid of the ^M characters, but then if the whole thing
> is run on a
> Mac, you'd get
> <x>
> onetwothree
> </x>
> so to get a reliable cross platform result you will need to use
> something like
>
>
> replace(unparsed-text('test.txt','US-ASCII'),'[@#xD;@#xA;]+','@#xD;')
>
> which isn't exactly difficult but
>
> a) it's a pain to have to do this every time
> b) People developing on Unix won't notice the problem, so are
> liable to
> use unparsed-text() directly and will find the stylesheets
> producing
> strange white space errors when run on other platforms.
> c) It's going to be an endless source of confused questions on user
> forums
>
> Any chance that unparsed-text could _always_ do line end
> translation to
> #10 modelled after XML parsers, just as it always does character
> encoding handling.
>
> David
>
> ______________________________________________________________
> __________
> This e-mail has been scanned for all viruses by Star Internet. The
> service is powered by MessageLabs. For more information on a proactive
> anti-virus service working around the clock, around the globe, visit:
> http://www.star.net.uk
> ______________________________________________________________
> __________
>
>
Received on Monday, 23 August 2004 14:37:42 UTC