- From: Henry S. Thompson <ht@inf.ed.ac.uk>
- Date: Fri, 01 May 2009 13:15:39 +0100
- To: "Philip Fennell" <Philip.Fennell@bbc.co.uk>
- Cc: "XProc Dev" <xproc-dev@w3.org>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Philip Fennell writes: > Thanks Norm, but I don't hink that helps. > >> If you're reading a document flowing through a pipeline, then it is > XML. > > That's exactly what I'm not trying to do. I'm wanting to invoke Tidy on > an HTML document that is not well-formed XML so that I can do further > processing on it. Therefore I need to use p:data to get hold of a > non-XML document. An alternative for this, and as you point out other similar up-translation/input coversion pipelines, is to define a script which calls wget/curl/your-choice and pipes the result to tidy, along the lines of fetch-and-tidy.sh: #!/bin/sh uri=$1 shift wget --output-document - "$uri" 2>/dev/null | tidy "$@" fetch-and-tidy.bat: @echo off set file=%1 shift wget --output-document - %file% 2>NUL: | tidy %1 %2 %3 %4 %5 %6 %7 %8 %9" <p:pipeline xmlns:p="http://www.w3.org/ns/xproc" xmlns:my="http://www.ltg.ed.ac.uk/~ht/"> <p:declare-step name="fetch-and-tidy" type="my:tidy"> <p:option name="href"/> <p:output port="result" primary="true"/> <p:exec command="fetch-and-tidy.bat" source-is-xml="false" result-is-xml="true" wrap-result-lines="false" name="ft"> <p:with-option name="args" select="concat('"',$href,'" -asxml --quiet yes --show-warnings no --doctype omit --numeric-entities yes --output-xml yes')"> <p:empty/> </p:with-option> <p:input port="source"> <p:empty/> </p:input> </p:exec> <p:unwrap match="c:result"/> </p:declare-step> <my:tidy href="http://www.ltg.ed.ac.uk/~ht/xx.html"/> </p:pipeline> The above works in Calabash 0.9.9 Note that in any case you need to tweak your pipeline a bit from where you and Norm left it, to get the xmlness of things accurately reflected. The following works in Calabash 0.9.9: <p:pipeline xmlns:p="http://www.w3.org/ns/xproc" xmlns:my="http://www.ltg.ed.ac.uk/~ht/"> <p:declare-step type="my:tidy"> <p:input port="source"/> <p:output port="result"/> <p:exec command="tidy" source-is-xml="false" result-is-xml="true" wrap-result-lines="false"> <p:with-option name="args" select="'-asxml --quiet yes --show-warnings no --doctype omit --numeric-entities yes --output-xml yes'"/> </p:exec> <p:unwrap match="c:result"/> </p:declare-step> <my:tidy> <p:input port="source"> <p:data href="http://www.ltg.ed.ac.uk/~ht/xx.html"/> </p:input> </my:tidy> </p:pipeline> ht - -- Henry S. Thompson, School of Informatics, University of Edinburgh Half-time member of W3C Team 10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.6 (GNU/Linux) iD8DBQFJ+ufskjnJixAXWBoRAp3sAKCA85FaAoslPBpqcQBvi0PCRuRNWgCcCuEw BOOYaFTWQNCluPfeEy15f/Y= =o/Gz -----END PGP SIGNATURE-----
Received on Friday, 1 May 2009 12:17:19 UTC