W3C home > Mailing lists > Public > xproc-dev@w3.org > May 2009

Re: How do you pass step options to p:data/@href???

From: David A. Lee <dlee@calldei.com>
Date: Fri, 01 May 2009 09:26:48 -0400
Message-ID: <49FAF898.6090404@calldei.com>
To: Philip Fennell <Philip.Fennell@bbc.co.uk>
CC: XProc Dev <xproc-dev@w3.org>
This exact reason is why xmlsh supports both xml and non-xml data.
 Because as much as I'd love to belive it, the fact is the world is not 
100% xml yet ... maybe someday ( see http://www.xmlsh.org/Philosophy ) 
so I do believe even XML centric scripting languages are much more 
useful if they can support non-xml data as equal citizens.

xmlsh with the calbash extension can solve the first design goal but not 
both.

1) it can encode the entire pipeline including non-xml code in one file, 
including embedding the xproc pipeline and passing it to calabash all 
inline in one file.
but
2) its not xml.... I personally dont find xml syntax very appetizing for 
writing procedural (or functional) programs,  which is why I prefer 
xquery to xslt,
even though xslt is more powerful.  But I do the appreciate the 
alternate view ...

But if your willing to suffer with your document not being xml, but *is* 
self contained
Your pipeline example in xmlsh assuming the calabash extension is loaded 
and tidy is in the path would be something like this

    tidy < http://somefile.com | xproc <[
       <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
              xmlns:my="http://www.ltg.ed.ac.uk/~ht/">
              <p:identity/>
              </p:pipeline>
   ]> 


A similar thing could be written in a more common unix shell script but 
with more convoluted syntax.




Philip Fennell wrote:
> Henry,
>
> Thanks too, but my ideal scenario is to encapsulate the entire
> description of the pipeline into a single document of one format (XML)
> that can be executed with as few dependencies as possible. The reason
> that I'm so adamant about XProc being able to do this is that I think
> the inability of XProc to either allow an expression in p:data/@href or
> use p:data/p:with-option is somewhat of an omission.
>
> There are potentially other use-cases beyond HTML Tidy. Taking URIs from
> source documents that point to non-XML files (CSS), retrieving them so
> that they can be processed using XSLT + Regular Expressions is not so
> much of an edge-case that it can't/shouldn't be easily supported.
>
>
> Regards
>
> Philip Fennell
>
>
>
> -----Original Message-----
> From: Henry S. Thompson [mailto:ht@inf.ed.ac.uk] 
> Sent: 01 May 2009 13:16
> To: Philip Fennell
> Cc: XProc Dev
> Subject: Re: How do you pass step options to p:data/@href???
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Philip Fennell writes:
>
>   
>> Thanks Norm, but I don't hink that helps.
>>
>>     
>>> If you're reading a document flowing through a pipeline, then it is
>>>       
>> XML.
>>
>> That's exactly what I'm not trying to do. I'm wanting to invoke Tidy 
>> on an HTML document that is not well-formed XML so that I can do 
>> further processing on it. Therefore I need to use p:data to get hold 
>> of a non-XML document.
>>     
>
> An alternative for this, and as you point out other similar
> up-translation/input coversion pipelines, is to define a script which
> calls wget/curl/your-choice and pipes the result to tidy, along the
> lines of
>
>  fetch-and-tidy.sh:
>   #!/bin/sh
>   uri=$1
>   shift
>   wget --output-document - "$uri" 2>/dev/null | tidy "$@"
>
>  fetch-and-tidy.bat:
>   @echo off
>   set file=%1
>   shift
>   wget --output-document - %file% 2>NUL: | tidy %1 %2 %3 %4 %5 %6 %7 %8
> %9"
>
>   <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
>               xmlns:my="http://www.ltg.ed.ac.uk/~ht/">
>    <p:declare-step name="fetch-and-tidy" type="my:tidy">  
>      <p:option name="href"/> 
>      <p:output port="result" primary="true"/>
>
>      <p:exec command="fetch-and-tidy.bat" source-is-xml="false"
>              result-is-xml="true" wrap-result-lines="false" name="ft"> 
>         <p:with-option name="args" select="concat('&quot;',$href,'&quot;
> -asxml --quiet yes --show-warnings no --doctype omit --numeric-entities
> yes --output-xml yes')">
>          <p:empty/>
>         </p:with-option>
>         <p:input port="source">
>          <p:empty/>            
>         </p:input> 
>       </p:exec> 
>       <p:unwrap match="c:result"/>
>     </p:declare-step> 
>    <my:tidy href="http://www.ltg.ed.ac.uk/~ht/xx.html"/>
>   </p:pipeline>
>
> The above works in Calabash 0.9.9
>
> Note that in any case you need to tweak your pipeline a bit from where
> you and Norm left it, to get the xmlness of things accurately reflected.
> The following works in Calabash 0.9.9:
>
>  <p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
>              xmlns:my="http://www.ltg.ed.ac.uk/~ht/">
>   <p:declare-step type="my:tidy">
>     <p:input port="source"/>
>     <p:output port="result"/>
>
>     <p:exec command="tidy"
>         source-is-xml="false"
>         result-is-xml="true"
>         wrap-result-lines="false">
>       <p:with-option name="args" select="'-asxml --quiet yes
> --show-warnings no --doctype omit --numeric-entities yes --output-xml
> yes'"/>
>     </p:exec>
>     <p:unwrap match="c:result"/>
>    </p:declare-step> 
>
>    <my:tidy>
>     <p:input port="source">
>      <p:data href="http://www.ltg.ed.ac.uk/~ht/xx.html"/>
>     </p:input>
>    </my:tidy>
>   </p:pipeline>
>
>
> ht
> - -- 
>        Henry S. Thompson, School of Informatics, University of Edinburgh
>                          Half-time member of W3C Team
>       10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131
> 650-4440
>                 Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk
>                        URL: http://www.ltg.ed.ac.uk/~ht/ [mail really
> from me _always_ has this .sig -- mail without it is forged spam]
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.6 (GNU/Linux)
>
> iD8DBQFJ+ufskjnJixAXWBoRAp3sAKCA85FaAoslPBpqcQBvi0PCRuRNWgCcCuEw
> BOOYaFTWQNCluPfeEy15f/Y=
> =o/Gz
> -----END PGP SIGNATURE-----
>
> http://www.bbc.co.uk/
> This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
> 					
>   

-- 
David A. Lee
dlee@calldei.com  
http://www.calldei.com
http://www.xmlsh.org
812-482-5224
Received on Friday, 1 May 2009 13:27:56 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 1 May 2009 13:27:56 GMT