RE: How do you pass step options to p:data/@href??? from Philip Fennell on 2009-05-01 (xproc-dev@w3.org from May 2009)

From: Philip Fennell <Philip.Fennell@bbc.co.uk>
Date: Fri, 1 May 2009 09:10:12 +0100
To: "XProc Dev" <xproc-dev@w3.org>
Message-ID: <F3685F4A877F284F8054E328390447D14C6E7E@bbcxues17.national.core.bbc.co.uk>
Thanks Norm, but I don't hink that helps.

> If you're reading a document flowing through a pipeline, then it is
XML.

That's exactly what I'm not trying to do. I'm wanting to invoke Tidy on
an HTML document that is not well-formed XML so that I can do further
processing on it. Therefore I need to use p:data to get hold of a
non-XML document. My problem is that p:data, and p:document for that
matter, do not allow you to use p:with-option so that you can use an
expression (XPath) instead of a string literal (URI).

I can get around the p:document problem by using p:load, but I cannot
see an equivalent for this particular use-case; and I imagine it will be
a popular use-case if people want to use XProc to build legacy-content
conversion pipelines where they may have large amounts of pages in
HTML/SGML or do screen-scrapping off of existing web sites.


Regards

Philip Fennell



-----Original Message-----
From: xproc-dev-request@w3.org [mailto:xproc-dev-request@w3.org] On
Behalf Of Norman Walsh
Sent: 30 April 2009 17:51
To: XProc Dev
Subject: Re: How do you pass step options to p:data/@href???

Philip Fennell <Philip.Fennell@bbc.co.uk> writes:
[...]
> It works fine as it is, but, I'd like to use the 'href' option I've 
> declared for the step to pass in the location of the source HTML file.
> However, the value of: p:data/@href won't take an expression e.g. 
> $href and p:data doesn't allow <p:with-option name="href" 
> select="$href"/>
>
> How do I pass the HTML source document URI to p:data as there is no 
> other mechanism to get hold of unparsed character data?

Use p:pipe to connect the source port of the exec command to the source
port on your declared step, like this:

  <p:declare-step name="main" type="tidy:html">
    <p:input port="source"/>
    <p:output port="result"/>
    <p:option name="href"/>

    <p:exec command="tidy"
        source-is-xml="true"
        result-is-xml="false"
        wrap-result-lines="false"
        method="xml">
      <p:input port="source">
        <p:pipe step="main" port="source"/>
      </p:input>
      <p:with-option name="args" select="'--quiet yes --show-warnings no
  --doctype omit --numeric-entities yes --output-xml yes'"/>
    </p:exec>

    <p:unescape-markup/>
    <p:unwrap match="c:result"/>
  </p:declare-step>

Note that I added a name to your declare step and changed source-is-xml
to true. If you're reading a document flowing through a pipeline, then
it is XML.

Since the source is the default readable port, you could also do this:

  <p:declare-step type="tidy:html">
    <p:input port="source"/>
    <p:output port="result"/>
    <p:option name="href"/>

    <p:exec command="tidy"
        source-is-xml="true"
        result-is-xml="false"
        wrap-result-lines="false"
        method="xml">
      <p:with-option name="args" select="'--quiet yes --show-warnings no
  --doctype omit --numeric-entities yes --output-xml yes'"/>
    </p:exec>

By default, the source port on p:exec will be connected to the default
readable port, which is the source port on your declare step in this
case.

                                        Be seeing you,
                                          norm

--
Norman Walsh <ndw@nwalsh.com> | It is well to remember that the entire
http://nwalsh.com/            | universe, with one trifling exception,
                              | is composed of others.--John Andrew
                              | Holmes

http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
Received on Friday, 1 May 2009 08:10:48 UTC