- From: Philip Fennell <Philip.Fennell@bbc.co.uk>
- Date: Fri, 1 May 2009 13:52:20 +0100
- To: "XProc Dev" <xproc-dev@w3.org>
Henry, Thanks too, but my ideal scenario is to encapsulate the entire description of the pipeline into a single document of one format (XML) that can be executed with as few dependencies as possible. The reason that I'm so adamant about XProc being able to do this is that I think the inability of XProc to either allow an expression in p:data/@href or use p:data/p:with-option is somewhat of an omission. There are potentially other use-cases beyond HTML Tidy. Taking URIs from source documents that point to non-XML files (CSS), retrieving them so that they can be processed using XSLT + Regular Expressions is not so much of an edge-case that it can't/shouldn't be easily supported. Regards Philip Fennell -----Original Message----- From: Henry S. Thompson [mailto:ht@inf.ed.ac.uk] Sent: 01 May 2009 13:16 To: Philip Fennell Cc: XProc Dev Subject: Re: How do you pass step options to p:data/@href??? -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Philip Fennell writes: > Thanks Norm, but I don't hink that helps. > >> If you're reading a document flowing through a pipeline, then it is > XML. > > That's exactly what I'm not trying to do. I'm wanting to invoke Tidy > on an HTML document that is not well-formed XML so that I can do > further processing on it. Therefore I need to use p:data to get hold > of a non-XML document. An alternative for this, and as you point out other similar up-translation/input coversion pipelines, is to define a script which calls wget/curl/your-choice and pipes the result to tidy, along the lines of fetch-and-tidy.sh: #!/bin/sh uri=$1 shift wget --output-document - "$uri" 2>/dev/null | tidy "$@" fetch-and-tidy.bat: @echo off set file=%1 shift wget --output-document - %file% 2>NUL: | tidy %1 %2 %3 %4 %5 %6 %7 %8 %9" <p:pipeline xmlns:p="http://www.w3.org/ns/xproc" xmlns:my="http://www.ltg.ed.ac.uk/~ht/"> <p:declare-step name="fetch-and-tidy" type="my:tidy"> <p:option name="href"/> <p:output port="result" primary="true"/> <p:exec command="fetch-and-tidy.bat" source-is-xml="false" result-is-xml="true" wrap-result-lines="false" name="ft"> <p:with-option name="args" select="concat('"',$href,'" -asxml --quiet yes --show-warnings no --doctype omit --numeric-entities yes --output-xml yes')"> <p:empty/> </p:with-option> <p:input port="source"> <p:empty/> </p:input> </p:exec> <p:unwrap match="c:result"/> </p:declare-step> <my:tidy href="http://www.ltg.ed.ac.uk/~ht/xx.html"/> </p:pipeline> The above works in Calabash 0.9.9 Note that in any case you need to tweak your pipeline a bit from where you and Norm left it, to get the xmlness of things accurately reflected. The following works in Calabash 0.9.9: <p:pipeline xmlns:p="http://www.w3.org/ns/xproc" xmlns:my="http://www.ltg.ed.ac.uk/~ht/"> <p:declare-step type="my:tidy"> <p:input port="source"/> <p:output port="result"/> <p:exec command="tidy" source-is-xml="false" result-is-xml="true" wrap-result-lines="false"> <p:with-option name="args" select="'-asxml --quiet yes --show-warnings no --doctype omit --numeric-entities yes --output-xml yes'"/> </p:exec> <p:unwrap match="c:result"/> </p:declare-step> <my:tidy> <p:input port="source"> <p:data href="http://www.ltg.ed.ac.uk/~ht/xx.html"/> </p:input> </my:tidy> </p:pipeline> ht - -- Henry S. Thompson, School of Informatics, University of Edinburgh Half-time member of W3C Team 10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) iD8DBQFJ+ufskjnJixAXWBoRAp3sAKCA85FaAoslPBpqcQBvi0PCRuRNWgCcCuEw BOOYaFTWQNCluPfeEy15f/Y= =o/Gz -----END PGP SIGNATURE----- http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Received on Friday, 1 May 2009 12:52:57 UTC