RE: [F&O] line ends in unparsed-text() from Michael Kay on 2004-08-23 (public-qt-comments@w3.org from August 2004)

From: Michael Kay <mhk@mhk.me.uk>
Date: Mon, 23 Aug 2004 15:37:09 +0100
To: "'David Carlisle'" <davidc@nag.co.uk>, <public-qt-comments@w3.org>
Message-ID: <E1BzFx8-0005LN-8x@frink.w3.org>

unparsed-text() is not actually an F&O function, it is an XSLT-only
function.

I think there's some merit in this suggestion, though as we get closer to
the finishing line the WG is starting to apply a fairly draconian approach
to the way it handles suggested improvements.

Michael Kay 

> -----Original Message-----
> From: public-qt-comments-request@w3.org 
> [mailto:public-qt-comments-request@w3.org] On Behalf Of David Carlisle
> Sent: 23 August 2004 15:17
> To: public-qt-comments@w3.org
> Subject: [F&O] line ends in unparsed-text()
> 
> 
> 
> unparsed-text() can not be used to input arbitary byte streams as the
> resulting string needs to conform to the character restrictions in the
> data model, and the input is subject to character encoding which may
> change the bytes anyway. So it is principally useful for "text files"
> (as its name suggests). However one of the main 
> distinguishing features
> of text files is that their line endings are platform-dependent and
> unparsed-text() (unlike an XML parser, and so the doc() function) does
> not take account of this.
> 
> This means that given a file test.txt
> 
> one
> two
> three
> 
> on Windows (to take a specific example)
> 
> <x>
> <xsl:value-of select="unparsed-text('test.txt','UTF-8')"/>
> </x>
> 
> will produce
> 
> <x>
> one@#xD;
> two@#xD;
> three@#xD;
> </x>
> 
> (I am using  @ rather than & here  in case the character 
> references get
> lost in some mail reading programs or translation to html in the
> archives)
> 
> One can avoid this by going, for example
> 
> translate(unparsed-text('test.txt','US-ASCII'),'@#xD;','')
> 
> to get rid of the ^M characters, but then if the whole thing 
> is run on a
> Mac, you'd get
> <x>
> onetwothree
> </x>
> so to get a reliable cross platform result you will need to use
> something like
> 
> 
> replace(unparsed-text('test.txt','US-ASCII'),'[@#xD;@#xA;]+','@#xD;')
> 
> which isn't exactly difficult but
> 
> a) it's a pain to have to do this every time
> b) People developing on Unix won't notice the problem, so are 
> liable to
>    use unparsed-text() directly and will find the stylesheets 
> producing
>    strange white space errors when run on other platforms.
> c) It's going to be an endless source of confused questions on user
>    forums
> 
> Any chance that unparsed-text could _always_ do line end 
> translation to
> #10 modelled after XML parsers, just as it always does character
> encoding handling.
> 
> David
> 
> ______________________________________________________________
> __________
> This e-mail has been scanned for all viruses by Star Internet. The
> service is powered by MessageLabs. For more information on a proactive
> anti-virus service working around the clock, around the globe, visit:
> http://www.star.net.uk
> ______________________________________________________________
> __________
> 
>

Received on Monday, 23 August 2004 14:37:42 UTC