- From: David A. Lee <dlee@calldei.com>
- Date: Fri, 01 May 2009 09:26:48 -0400
- To: Philip Fennell <Philip.Fennell@bbc.co.uk>
- CC: XProc Dev <xproc-dev@w3.org>
- Message-ID: <49FAF898.6090404@calldei.com>
This exact reason is why xmlsh supports both xml and non-xml data.
Because as much as I'd love to belive it, the fact is the world is not
100% xml yet ... maybe someday ( see http://www.xmlsh.org/Philosophy )
so I do believe even XML centric scripting languages are much more
useful if they can support non-xml data as equal citizens.
xmlsh with the calbash extension can solve the first design goal but not
both.
1) it can encode the entire pipeline including non-xml code in one file,
including embedding the xproc pipeline and passing it to calabash all
inline in one file.
but
2) its not xml.... I personally dont find xml syntax very appetizing for
writing procedural (or functional) programs, which is why I prefer
xquery to xslt,
even though xslt is more powerful. But I do the appreciate the
alternate view ...
But if your willing to suffer with your document not being xml, but *is*
self contained
Your pipeline example in xmlsh assuming the calabash extension is loaded
and tidy is in the path would be something like this
tidy < http://somefile.com | xproc <[
<p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
xmlns:my="http://www.ltg.ed.ac.uk/~ht/">
<p:identity/>
</p:pipeline>
]>
A similar thing could be written in a more common unix shell script but
with more convoluted syntax.
Philip Fennell wrote:
> Henry,
>
> Thanks too, but my ideal scenario is to encapsulate the entire
> description of the pipeline into a single document of one format (XML)
> that can be executed with as few dependencies as possible. The reason
> that I'm so adamant about XProc being able to do this is that I think
> the inability of XProc to either allow an expression in p:data/@href or
> use p:data/p:with-option is somewhat of an omission.
>
> There are potentially other use-cases beyond HTML Tidy. Taking URIs from
> source documents that point to non-XML files (CSS), retrieving them so
> that they can be processed using XSLT + Regular Expressions is not so
> much of an edge-case that it can't/shouldn't be easily supported.
>
>
> Regards
>
> Philip Fennell
>
>
>
> -----Original Message-----
> From: Henry S. Thompson [mailto:ht@inf.ed.ac.uk]
> Sent: 01 May 2009 13:16
> To: Philip Fennell
> Cc: XProc Dev
> Subject: Re: How do you pass step options to p:data/@href???
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Philip Fennell writes:
>
>
>> Thanks Norm, but I don't hink that helps.
>>
>>
>>> If you're reading a document flowing through a pipeline, then it is
>>>
>> XML.
>>
>> That's exactly what I'm not trying to do. I'm wanting to invoke Tidy
>> on an HTML document that is not well-formed XML so that I can do
>> further processing on it. Therefore I need to use p:data to get hold
>> of a non-XML document.
>>
>
> An alternative for this, and as you point out other similar
> up-translation/input coversion pipelines, is to define a script which
> calls wget/curl/your-choice and pipes the result to tidy, along the
> lines of
>
> fetch-and-tidy.sh:
> #!/bin/sh
> uri=$1
> shift
> wget --output-document - "$uri" 2>/dev/null | tidy "$@"
>
> fetch-and-tidy.bat:
> @echo off
> set file=%1
> shift
> wget --output-document - %file% 2>NUL: | tidy %1 %2 %3 %4 %5 %6 %7 %8
> %9"
>
> <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
> xmlns:my="http://www.ltg.ed.ac.uk/~ht/">
> <p:declare-step name="fetch-and-tidy" type="my:tidy">
> <p:option name="href"/>
> <p:output port="result" primary="true"/>
>
> <p:exec command="fetch-and-tidy.bat" source-is-xml="false"
> result-is-xml="true" wrap-result-lines="false" name="ft">
> <p:with-option name="args" select="concat('"',$href,'"
> -asxml --quiet yes --show-warnings no --doctype omit --numeric-entities
> yes --output-xml yes')">
> <p:empty/>
> </p:with-option>
> <p:input port="source">
> <p:empty/>
> </p:input>
> </p:exec>
> <p:unwrap match="c:result"/>
> </p:declare-step>
> <my:tidy href="http://www.ltg.ed.ac.uk/~ht/xx.html"/>
> </p:pipeline>
>
> The above works in Calabash 0.9.9
>
> Note that in any case you need to tweak your pipeline a bit from where
> you and Norm left it, to get the xmlness of things accurately reflected.
> The following works in Calabash 0.9.9:
>
> <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
> xmlns:my="http://www.ltg.ed.ac.uk/~ht/">
> <p:declare-step type="my:tidy">
> <p:input port="source"/>
> <p:output port="result"/>
>
> <p:exec command="tidy"
> source-is-xml="false"
> result-is-xml="true"
> wrap-result-lines="false">
> <p:with-option name="args" select="'-asxml --quiet yes
> --show-warnings no --doctype omit --numeric-entities yes --output-xml
> yes'"/>
> </p:exec>
> <p:unwrap match="c:result"/>
> </p:declare-step>
>
> <my:tidy>
> <p:input port="source">
> <p:data href="http://www.ltg.ed.ac.uk/~ht/xx.html"/>
> </p:input>
> </my:tidy>
> </p:pipeline>
>
>
> ht
> - --
> Henry S. Thompson, School of Informatics, University of Edinburgh
> Half-time member of W3C Team
> 10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131
> 650-4440
> Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk
> URL: http://www.ltg.ed.ac.uk/~ht/ [mail really
> from me _always_ has this .sig -- mail without it is forged spam]
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.6 (GNU/Linux)
>
> iD8DBQFJ+ufskjnJixAXWBoRAp3sAKCA85FaAoslPBpqcQBvi0PCRuRNWgCcCuEw
> BOOYaFTWQNCluPfeEy15f/Y=
> =o/Gz
> -----END PGP SIGNATURE-----
>
> http://www.bbc.co.uk/
> This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
>
>
--
David A. Lee
dlee@calldei.com
http://www.calldei.com
http://www.xmlsh.org
812-482-5224
Received on Friday, 1 May 2009 13:27:56 UTC