- From: David A. Lee <dlee@calldei.com>
- Date: Fri, 01 May 2009 09:26:48 -0400
- To: Philip Fennell <Philip.Fennell@bbc.co.uk>
- CC: XProc Dev <xproc-dev@w3.org>
- Message-ID: <49FAF898.6090404@calldei.com>
This exact reason is why xmlsh supports both xml and non-xml data. Because as much as I'd love to belive it, the fact is the world is not 100% xml yet ... maybe someday ( see http://www.xmlsh.org/Philosophy ) so I do believe even XML centric scripting languages are much more useful if they can support non-xml data as equal citizens. xmlsh with the calbash extension can solve the first design goal but not both. 1) it can encode the entire pipeline including non-xml code in one file, including embedding the xproc pipeline and passing it to calabash all inline in one file. but 2) its not xml.... I personally dont find xml syntax very appetizing for writing procedural (or functional) programs, which is why I prefer xquery to xslt, even though xslt is more powerful. But I do the appreciate the alternate view ... But if your willing to suffer with your document not being xml, but *is* self contained Your pipeline example in xmlsh assuming the calabash extension is loaded and tidy is in the path would be something like this tidy < http://somefile.com | xproc <[ <p:pipeline xmlns:p="http://www.w3.org/ns/xproc" xmlns:my="http://www.ltg.ed.ac.uk/~ht/"> <p:identity/> </p:pipeline> ]> A similar thing could be written in a more common unix shell script but with more convoluted syntax. Philip Fennell wrote: > Henry, > > Thanks too, but my ideal scenario is to encapsulate the entire > description of the pipeline into a single document of one format (XML) > that can be executed with as few dependencies as possible. The reason > that I'm so adamant about XProc being able to do this is that I think > the inability of XProc to either allow an expression in p:data/@href or > use p:data/p:with-option is somewhat of an omission. > > There are potentially other use-cases beyond HTML Tidy. Taking URIs from > source documents that point to non-XML files (CSS), retrieving them so > that they can be processed using XSLT + Regular Expressions is not so > much of an edge-case that it can't/shouldn't be easily supported. > > > Regards > > Philip Fennell > > > > -----Original Message----- > From: Henry S. Thompson [mailto:ht@inf.ed.ac.uk] > Sent: 01 May 2009 13:16 > To: Philip Fennell > Cc: XProc Dev > Subject: Re: How do you pass step options to p:data/@href??? > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Philip Fennell writes: > > >> Thanks Norm, but I don't hink that helps. >> >> >>> If you're reading a document flowing through a pipeline, then it is >>> >> XML. >> >> That's exactly what I'm not trying to do. I'm wanting to invoke Tidy >> on an HTML document that is not well-formed XML so that I can do >> further processing on it. Therefore I need to use p:data to get hold >> of a non-XML document. >> > > An alternative for this, and as you point out other similar > up-translation/input coversion pipelines, is to define a script which > calls wget/curl/your-choice and pipes the result to tidy, along the > lines of > > fetch-and-tidy.sh: > #!/bin/sh > uri=$1 > shift > wget --output-document - "$uri" 2>/dev/null | tidy "$@" > > fetch-and-tidy.bat: > @echo off > set file=%1 > shift > wget --output-document - %file% 2>NUL: | tidy %1 %2 %3 %4 %5 %6 %7 %8 > %9" > > <p:pipeline xmlns:p="http://www.w3.org/ns/xproc" > xmlns:my="http://www.ltg.ed.ac.uk/~ht/"> > <p:declare-step name="fetch-and-tidy" type="my:tidy"> > <p:option name="href"/> > <p:output port="result" primary="true"/> > > <p:exec command="fetch-and-tidy.bat" source-is-xml="false" > result-is-xml="true" wrap-result-lines="false" name="ft"> > <p:with-option name="args" select="concat('"',$href,'" > -asxml --quiet yes --show-warnings no --doctype omit --numeric-entities > yes --output-xml yes')"> > <p:empty/> > </p:with-option> > <p:input port="source"> > <p:empty/> > </p:input> > </p:exec> > <p:unwrap match="c:result"/> > </p:declare-step> > <my:tidy href="http://www.ltg.ed.ac.uk/~ht/xx.html"/> > </p:pipeline> > > The above works in Calabash 0.9.9 > > Note that in any case you need to tweak your pipeline a bit from where > you and Norm left it, to get the xmlness of things accurately reflected. > The following works in Calabash 0.9.9: > > <p:pipeline xmlns:p="http://www.w3.org/ns/xproc" > xmlns:my="http://www.ltg.ed.ac.uk/~ht/"> > <p:declare-step type="my:tidy"> > <p:input port="source"/> > <p:output port="result"/> > > <p:exec command="tidy" > source-is-xml="false" > result-is-xml="true" > wrap-result-lines="false"> > <p:with-option name="args" select="'-asxml --quiet yes > --show-warnings no --doctype omit --numeric-entities yes --output-xml > yes'"/> > </p:exec> > <p:unwrap match="c:result"/> > </p:declare-step> > > <my:tidy> > <p:input port="source"> > <p:data href="http://www.ltg.ed.ac.uk/~ht/xx.html"/> > </p:input> > </my:tidy> > </p:pipeline> > > > ht > - -- > Henry S. Thompson, School of Informatics, University of Edinburgh > Half-time member of W3C Team > 10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 > 650-4440 > Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk > URL: http://www.ltg.ed.ac.uk/~ht/ [mail really > from me _always_ has this .sig -- mail without it is forged spam] > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.6 (GNU/Linux) > > iD8DBQFJ+ufskjnJixAXWBoRAp3sAKCA85FaAoslPBpqcQBvi0PCRuRNWgCcCuEw > BOOYaFTWQNCluPfeEy15f/Y= > =o/Gz > -----END PGP SIGNATURE----- > > http://www.bbc.co.uk/ > This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. > If you have received it in error, please delete it from your system. > Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. > Please note that the BBC monitors e-mails sent or received. > Further communication will signify your consent to this. > > -- David A. Lee dlee@calldei.com http://www.calldei.com http://www.xmlsh.org 812-482-5224
Received on Friday, 1 May 2009 13:27:56 UTC