W3C home > Mailing lists > Public > xproc-dev@w3.org > May 2009

Re: How do you pass step options to p:data/@href???

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Fri, 01 May 2009 13:15:39 +0100
To: "Philip Fennell" <Philip.Fennell@bbc.co.uk>
Cc: "XProc Dev" <xproc-dev@w3.org>
Message-ID: <f5b4ow5f28k.fsf@hildegard.inf.ed.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Philip Fennell writes:

> Thanks Norm, but I don't hink that helps.
>
>> If you're reading a document flowing through a pipeline, then it is
> XML.
>
> That's exactly what I'm not trying to do. I'm wanting to invoke Tidy on
> an HTML document that is not well-formed XML so that I can do further
> processing on it. Therefore I need to use p:data to get hold of a
> non-XML document.

An alternative for this, and as you point out other similar
up-translation/input coversion pipelines, is to define a script which
calls wget/curl/your-choice and pipes the result to tidy, along the
lines of

 fetch-and-tidy.sh:
  #!/bin/sh
  uri=$1
  shift
  wget --output-document - "$uri" 2>/dev/null | tidy "$@"

 fetch-and-tidy.bat:
  @echo off
  set file=%1
  shift
  wget --output-document - %file% 2>NUL: | tidy %1 %2 %3 %4 %5 %6 %7 %8 %9"

  <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
              xmlns:my="http://www.ltg.ed.ac.uk/~ht/">
   <p:declare-step name="fetch-and-tidy" type="my:tidy">  
     <p:option name="href"/> 
     <p:output port="result" primary="true"/>

     <p:exec command="fetch-and-tidy.bat" source-is-xml="false"
             result-is-xml="true" wrap-result-lines="false" name="ft"> 
        <p:with-option name="args" select="concat('&quot;',$href,'&quot; -asxml --quiet yes --show-warnings no --doctype omit --numeric-entities yes --output-xml yes')">
         <p:empty/>
        </p:with-option>
        <p:input port="source">
         <p:empty/>            
        </p:input> 
      </p:exec> 
      <p:unwrap match="c:result"/>
    </p:declare-step> 
   <my:tidy href="http://www.ltg.ed.ac.uk/~ht/xx.html"/>
  </p:pipeline>

The above works in Calabash 0.9.9

Note that in any case you need to tweak your pipeline a bit from where
you and Norm left it, to get the xmlness of things accurately
reflected.  The following works in Calabash 0.9.9:

 <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
             xmlns:my="http://www.ltg.ed.ac.uk/~ht/">
  <p:declare-step type="my:tidy">
    <p:input port="source"/>
    <p:output port="result"/>

    <p:exec command="tidy"
        source-is-xml="false"
        result-is-xml="true"
        wrap-result-lines="false">
      <p:with-option name="args" select="'-asxml --quiet yes --show-warnings no --doctype omit --numeric-entities yes --output-xml yes'"/>
    </p:exec>
    <p:unwrap match="c:result"/>
   </p:declare-step> 

   <my:tidy>
    <p:input port="source">
     <p:data href="http://www.ltg.ed.ac.uk/~ht/xx.html"/>
    </p:input>
   </my:tidy>
  </p:pipeline>


ht
- -- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
                         Half-time member of W3C Team
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFJ+ufskjnJixAXWBoRAp3sAKCA85FaAoslPBpqcQBvi0PCRuRNWgCcCuEw
BOOYaFTWQNCluPfeEy15f/Y=
=o/Gz
-----END PGP SIGNATURE-----
Received on Friday, 1 May 2009 12:17:19 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 1 May 2009 12:17:20 GMT