W3C home > Mailing lists > Public > xproc-dev@w3.org > May 2009

RE: How do you pass step options to p:data/@href???

From: Philip Fennell <Philip.Fennell@bbc.co.uk>
Date: Fri, 1 May 2009 14:19:31 +0100
Message-ID: <F3685F4A877F284F8054E328390447D14C6E82@bbcxues17.national.core.bbc.co.uk>
To: "XProc Dev" <xproc-dev@w3.org>
Norm wrote:

> The workaround is p:http-request, even though that's inelegant in some
ways.

Thanks Norm, that'll do nicely.


> I understand that use case, but I don't understand what you plan to 
> pass *to* your tidy:html step. The things that appear on p:input ports
> *must* be well-formed XML.

The tidy:html step, is in effect, stand-alone. The input is irrelavent
but the output is the 'source' of the pipeline.


Regards

Philip Fennell

-----Original Message-----
From: xproc-dev-request@w3.org [mailto:xproc-dev-request@w3.org] On
Behalf Of Norman Walsh
Sent: 01 May 2009 14:06
To: XProc Dev
Subject: Re: How do you pass step options to p:data/@href???

Philip Fennell <Philip.Fennell@bbc.co.uk> writes:
> That's exactly what I'm not trying to do. I'm wanting to invoke Tidy 
> on an HTML document that is not well-formed XML so that I can do 
> further processing on it.

I understand that use case, but I don't understand what you plan to pass
*to* your tidy:html step. The things that appear on p:input ports
*must* be well-formed XML.

> Therefore I need to use p:data to get hold of a non-XML document. My 
> problem is that p:data, and p:document for that matter, do not allow 
> you to use p:with-option so that you can use an expression (XPath) 
> instead of a string literal (URI).
>
> I can get around the p:document problem by using p:load, but I cannot 
> see an equivalent for this particular use-case; and I imagine it will 
> be a popular use-case if people want to use XProc to build 
> legacy-content conversion pipelines where they may have large amounts 
> of pages in HTML/SGML or do screen-scrapping off of existing web
sites.

The workaround is p:http-request, even though that's inelegant in some
ways. From 2.2.2 Non-XML Documents:

  It is not a goal of XProc that it should be a general-purpose
  pipeline language for manipulating arbitrary, non-XML resources.

  There are two standard ways that a non-XML document may enter a
  pipeline: directly through p:data or as the result of performing an
  p:http-request step. Loading non-XML data with a computed URI
  requires the p:http-request step. Implementors are encouraged to
  support the file: URI scheme so that users can load local data from
  computed URIs.

So, if you have the computed URI of a document in $uri, you can load it
with p:http-request:

  <p:http-request method="get">
    <p:with-option name="href" select="$uri"/>
  </p:http-request>

Implementors are encouraged to make that work for file: URIs as well as
http(s): URIs. XML Calabash supports it.

                                        Be seeing you,
                                          norm

P.S. What, you may ask, possessed the WG to use p:*HTTP*-request to load
URIs from file: URIs? On the one hand, adding another step to load from
file: URIs would have largely reproduced the p:http-request step, and on
the other, there wasn't any obviously better name for p:http-request.

--
Norman Walsh <ndw@nwalsh.com> | One stops being a child when one
http://nwalsh.com/            | realizes that telling one's trouble
                              | does not make it better.--Cesare Pavese

http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
					
Received on Friday, 1 May 2009 13:20:07 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 1 May 2009 13:20:07 GMT