- From: Philip Fennell <Philip.Fennell@bbc.co.uk>
- Date: Fri, 1 May 2009 09:10:12 +0100
- To: "XProc Dev" <xproc-dev@w3.org>
Thanks Norm, but I don't hink that helps. > If you're reading a document flowing through a pipeline, then it is XML. That's exactly what I'm not trying to do. I'm wanting to invoke Tidy on an HTML document that is not well-formed XML so that I can do further processing on it. Therefore I need to use p:data to get hold of a non-XML document. My problem is that p:data, and p:document for that matter, do not allow you to use p:with-option so that you can use an expression (XPath) instead of a string literal (URI). I can get around the p:document problem by using p:load, but I cannot see an equivalent for this particular use-case; and I imagine it will be a popular use-case if people want to use XProc to build legacy-content conversion pipelines where they may have large amounts of pages in HTML/SGML or do screen-scrapping off of existing web sites. Regards Philip Fennell -----Original Message----- From: xproc-dev-request@w3.org [mailto:xproc-dev-request@w3.org] On Behalf Of Norman Walsh Sent: 30 April 2009 17:51 To: XProc Dev Subject: Re: How do you pass step options to p:data/@href??? Philip Fennell <Philip.Fennell@bbc.co.uk> writes: [...] > It works fine as it is, but, I'd like to use the 'href' option I've > declared for the step to pass in the location of the source HTML file. > However, the value of: p:data/@href won't take an expression e.g. > $href and p:data doesn't allow <p:with-option name="href" > select="$href"/> > > How do I pass the HTML source document URI to p:data as there is no > other mechanism to get hold of unparsed character data? Use p:pipe to connect the source port of the exec command to the source port on your declared step, like this: <p:declare-step name="main" type="tidy:html"> <p:input port="source"/> <p:output port="result"/> <p:option name="href"/> <p:exec command="tidy" source-is-xml="true" result-is-xml="false" wrap-result-lines="false" method="xml"> <p:input port="source"> <p:pipe step="main" port="source"/> </p:input> <p:with-option name="args" select="'--quiet yes --show-warnings no --doctype omit --numeric-entities yes --output-xml yes'"/> </p:exec> <p:unescape-markup/> <p:unwrap match="c:result"/> </p:declare-step> Note that I added a name to your declare step and changed source-is-xml to true. If you're reading a document flowing through a pipeline, then it is XML. Since the source is the default readable port, you could also do this: <p:declare-step type="tidy:html"> <p:input port="source"/> <p:output port="result"/> <p:option name="href"/> <p:exec command="tidy" source-is-xml="true" result-is-xml="false" wrap-result-lines="false" method="xml"> <p:with-option name="args" select="'--quiet yes --show-warnings no --doctype omit --numeric-entities yes --output-xml yes'"/> </p:exec> By default, the source port on p:exec will be connected to the default readable port, which is the source port on your declare step in this case. Be seeing you, norm -- Norman Walsh <ndw@nwalsh.com> | It is well to remember that the entire http://nwalsh.com/ | universe, with one trifling exception, | is composed of others.--John Andrew | Holmes http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Received on Friday, 1 May 2009 08:10:48 UTC