- From: Michael Kay <mhk@mhk.me.uk>
- Date: Mon, 23 Aug 2004 15:37:09 +0100
- To: "'David Carlisle'" <davidc@nag.co.uk>, <public-qt-comments@w3.org>
unparsed-text() is not actually an F&O function, it is an XSLT-only function. I think there's some merit in this suggestion, though as we get closer to the finishing line the WG is starting to apply a fairly draconian approach to the way it handles suggested improvements. Michael Kay > -----Original Message----- > From: public-qt-comments-request@w3.org > [mailto:public-qt-comments-request@w3.org] On Behalf Of David Carlisle > Sent: 23 August 2004 15:17 > To: public-qt-comments@w3.org > Subject: [F&O] line ends in unparsed-text() > > > > unparsed-text() can not be used to input arbitary byte streams as the > resulting string needs to conform to the character restrictions in the > data model, and the input is subject to character encoding which may > change the bytes anyway. So it is principally useful for "text files" > (as its name suggests). However one of the main > distinguishing features > of text files is that their line endings are platform-dependent and > unparsed-text() (unlike an XML parser, and so the doc() function) does > not take account of this. > > This means that given a file test.txt > > one > two > three > > on Windows (to take a specific example) > > <x> > <xsl:value-of select="unparsed-text('test.txt','UTF-8')"/> > </x> > > will produce > > <x> > one@#xD; > two@#xD; > three@#xD; > </x> > > (I am using @ rather than & here in case the character > references get > lost in some mail reading programs or translation to html in the > archives) > > One can avoid this by going, for example > > translate(unparsed-text('test.txt','US-ASCII'),'@#xD;','') > > to get rid of the ^M characters, but then if the whole thing > is run on a > Mac, you'd get > <x> > onetwothree > </x> > so to get a reliable cross platform result you will need to use > something like > > > replace(unparsed-text('test.txt','US-ASCII'),'[@#xD;@#xA;]+','@#xD;') > > which isn't exactly difficult but > > a) it's a pain to have to do this every time > b) People developing on Unix won't notice the problem, so are > liable to > use unparsed-text() directly and will find the stylesheets > producing > strange white space errors when run on other platforms. > c) It's going to be an endless source of confused questions on user > forums > > Any chance that unparsed-text could _always_ do line end > translation to > #10 modelled after XML parsers, just as it always does character > encoding handling. > > David > > ______________________________________________________________ > __________ > This e-mail has been scanned for all viruses by Star Internet. The > service is powered by MessageLabs. For more information on a proactive > anti-virus service working around the clock, around the globe, visit: > http://www.star.net.uk > ______________________________________________________________ > __________ > >
Received on Monday, 23 August 2004 14:37:42 UTC