- From: Philip Fennell <Philip.Fennell@bbc.co.uk>
- Date: Fri, 1 May 2009 14:19:31 +0100
- To: "XProc Dev" <xproc-dev@w3.org>
Norm wrote: > The workaround is p:http-request, even though that's inelegant in some ways. Thanks Norm, that'll do nicely. > I understand that use case, but I don't understand what you plan to > pass *to* your tidy:html step. The things that appear on p:input ports > *must* be well-formed XML. The tidy:html step, is in effect, stand-alone. The input is irrelavent but the output is the 'source' of the pipeline. Regards Philip Fennell -----Original Message----- From: xproc-dev-request@w3.org [mailto:xproc-dev-request@w3.org] On Behalf Of Norman Walsh Sent: 01 May 2009 14:06 To: XProc Dev Subject: Re: How do you pass step options to p:data/@href??? Philip Fennell <Philip.Fennell@bbc.co.uk> writes: > That's exactly what I'm not trying to do. I'm wanting to invoke Tidy > on an HTML document that is not well-formed XML so that I can do > further processing on it. I understand that use case, but I don't understand what you plan to pass *to* your tidy:html step. The things that appear on p:input ports *must* be well-formed XML. > Therefore I need to use p:data to get hold of a non-XML document. My > problem is that p:data, and p:document for that matter, do not allow > you to use p:with-option so that you can use an expression (XPath) > instead of a string literal (URI). > > I can get around the p:document problem by using p:load, but I cannot > see an equivalent for this particular use-case; and I imagine it will > be a popular use-case if people want to use XProc to build > legacy-content conversion pipelines where they may have large amounts > of pages in HTML/SGML or do screen-scrapping off of existing web sites. The workaround is p:http-request, even though that's inelegant in some ways. From 2.2.2 Non-XML Documents: It is not a goal of XProc that it should be a general-purpose pipeline language for manipulating arbitrary, non-XML resources. There are two standard ways that a non-XML document may enter a pipeline: directly through p:data or as the result of performing an p:http-request step. Loading non-XML data with a computed URI requires the p:http-request step. Implementors are encouraged to support the file: URI scheme so that users can load local data from computed URIs. So, if you have the computed URI of a document in $uri, you can load it with p:http-request: <p:http-request method="get"> <p:with-option name="href" select="$uri"/> </p:http-request> Implementors are encouraged to make that work for file: URIs as well as http(s): URIs. XML Calabash supports it. Be seeing you, norm P.S. What, you may ask, possessed the WG to use p:*HTTP*-request to load URIs from file: URIs? On the one hand, adding another step to load from file: URIs would have largely reproduced the p:http-request step, and on the other, there wasn't any obviously better name for p:http-request. -- Norman Walsh <ndw@nwalsh.com> | One stops being a child when one http://nwalsh.com/ | realizes that telling one's trouble | does not make it better.--Cesare Pavese http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Received on Friday, 1 May 2009 13:20:07 UTC