W3C home > Mailing lists > Public > xproc-dev@w3.org > May 2009

RE: How do you pass step options to p:data/@href???

From: Philip Fennell <Philip.Fennell@bbc.co.uk>
Date: Fri, 1 May 2009 13:52:20 +0100
Message-ID: <F3685F4A877F284F8054E328390447D14C6E81@bbcxues17.national.core.bbc.co.uk>
To: "XProc Dev" <xproc-dev@w3.org>
Henry,

Thanks too, but my ideal scenario is to encapsulate the entire
description of the pipeline into a single document of one format (XML)
that can be executed with as few dependencies as possible. The reason
that I'm so adamant about XProc being able to do this is that I think
the inability of XProc to either allow an expression in p:data/@href or
use p:data/p:with-option is somewhat of an omission.

There are potentially other use-cases beyond HTML Tidy. Taking URIs from
source documents that point to non-XML files (CSS), retrieving them so
that they can be processed using XSLT + Regular Expressions is not so
much of an edge-case that it can't/shouldn't be easily supported.


Regards

Philip Fennell



-----Original Message-----
From: Henry S. Thompson [mailto:ht@inf.ed.ac.uk] 
Sent: 01 May 2009 13:16
To: Philip Fennell
Cc: XProc Dev
Subject: Re: How do you pass step options to p:data/@href???

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Philip Fennell writes:

> Thanks Norm, but I don't hink that helps.
>
>> If you're reading a document flowing through a pipeline, then it is
> XML.
>
> That's exactly what I'm not trying to do. I'm wanting to invoke Tidy 
> on an HTML document that is not well-formed XML so that I can do 
> further processing on it. Therefore I need to use p:data to get hold 
> of a non-XML document.

An alternative for this, and as you point out other similar
up-translation/input coversion pipelines, is to define a script which
calls wget/curl/your-choice and pipes the result to tidy, along the
lines of

 fetch-and-tidy.sh:
  #!/bin/sh
  uri=$1
  shift
  wget --output-document - "$uri" 2>/dev/null | tidy "$@"

 fetch-and-tidy.bat:
  @echo off
  set file=%1
  shift
  wget --output-document - %file% 2>NUL: | tidy %1 %2 %3 %4 %5 %6 %7 %8
%9"

  <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
              xmlns:my="http://www.ltg.ed.ac.uk/~ht/">
   <p:declare-step name="fetch-and-tidy" type="my:tidy">  
     <p:option name="href"/> 
     <p:output port="result" primary="true"/>

     <p:exec command="fetch-and-tidy.bat" source-is-xml="false"
             result-is-xml="true" wrap-result-lines="false" name="ft"> 
        <p:with-option name="args" select="concat('&quot;',$href,'&quot;
-asxml --quiet yes --show-warnings no --doctype omit --numeric-entities
yes --output-xml yes')">
         <p:empty/>
        </p:with-option>
        <p:input port="source">
         <p:empty/>            
        </p:input> 
      </p:exec> 
      <p:unwrap match="c:result"/>
    </p:declare-step> 
   <my:tidy href="http://www.ltg.ed.ac.uk/~ht/xx.html"/>
  </p:pipeline>

The above works in Calabash 0.9.9

Note that in any case you need to tweak your pipeline a bit from where
you and Norm left it, to get the xmlness of things accurately reflected.
The following works in Calabash 0.9.9:

 <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
             xmlns:my="http://www.ltg.ed.ac.uk/~ht/">
  <p:declare-step type="my:tidy">
    <p:input port="source"/>
    <p:output port="result"/>

    <p:exec command="tidy"
        source-is-xml="false"
        result-is-xml="true"
        wrap-result-lines="false">
      <p:with-option name="args" select="'-asxml --quiet yes
--show-warnings no --doctype omit --numeric-entities yes --output-xml
yes'"/>
    </p:exec>
    <p:unwrap match="c:result"/>
   </p:declare-step> 

   <my:tidy>
    <p:input port="source">
     <p:data href="http://www.ltg.ed.ac.uk/~ht/xx.html"/>
    </p:input>
   </my:tidy>
  </p:pipeline>


ht
- -- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
                         Half-time member of W3C Team
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131
650-4440
                Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/ [mail really
from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFJ+ufskjnJixAXWBoRAp3sAKCA85FaAoslPBpqcQBvi0PCRuRNWgCcCuEw
BOOYaFTWQNCluPfeEy15f/Y=
=o/Gz
-----END PGP SIGNATURE-----

http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
					
Received on Friday, 1 May 2009 12:52:57 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 1 May 2009 12:52:57 GMT