[F&O] line ends in unparsed-text()

unparsed-text() can not be used to input arbitary byte streams as the
resulting string needs to conform to the character restrictions in the
data model, and the input is subject to character encoding which may
change the bytes anyway. So it is principally useful for "text files"
(as its name suggests). However one of the main distinguishing features
of text files is that their line endings are platform-dependent and
unparsed-text() (unlike an XML parser, and so the doc() function) does
not take account of this.

This means that given a file test.txt

one
two
three

on Windows (to take a specific example)

<x>
<xsl:value-of select="unparsed-text('test.txt','UTF-8')"/>
</x>

will produce

<x>
one@#xD;
two@#xD;
three@#xD;
</x>

(I am using  @ rather than & here  in case the character references get
lost in some mail reading programs or translation to html in the
archives)

One can avoid this by going, for example

translate(unparsed-text('test.txt','US-ASCII'),'@#xD;','')

to get rid of the ^M characters, but then if the whole thing is run on a
Mac, you'd get
<x>
onetwothree
</x>
so to get a reliable cross platform result you will need to use
something like


replace(unparsed-text('test.txt','US-ASCII'),'[@#xD;@#xA;]+','@#xD;')

which isn't exactly difficult but

a) it's a pain to have to do this every time
b) People developing on Unix won't notice the problem, so are liable to
   use unparsed-text() directly and will find the stylesheets producing
   strange white space errors when run on other platforms.
c) It's going to be an endless source of confused questions on user
   forums

Any chance that unparsed-text could _always_ do line end translation to
#10 modelled after XML parsers, just as it always does character
encoding handling.

David

________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

Received on Monday, 23 August 2004 14:18:13 UTC