W3C home > Mailing lists > Public > xproc-dev@w3.org > February 2011

Re: Unzipping .bz2 ?

From: Stefanie Haupt <st.haupt@gmail.com>
Date: Wed, 9 Feb 2011 15:31:17 +0100
Message-ID: <AANLkTimW6FJzE+8tgvM8PiR9UjFiYXwOdSs2kNMbnrM1@mail.gmail.com>
To: xproc-dev@w3.org
Vojtech,
using concat would be a good idea here, but neither calumet 1.0.12 nor
calabash 0.9.32 interpret XPath functions inside the args of p:exec.
Thank you for explaining the issue with p:data and the base64
encoding, as I did not understand the relevance before.

Regards,
Stefanie



On Wed, Feb 9, 2011 at 2:41 PM,  <vojtech.toman@emc.com> wrote:
> I think this should work:
> <p:with-option name="args" select="concat('-d -k ', $filename)"/>
>
> That the output of p:data cannot be processed by bunzip2 is because p:data base64 encodes the byte stream and wraps it in an XML wrapper element, which - from the bunzip2 perspective - totally destroys the original byte stream.
>
>
> Regards,
> Vojtech
>
>
> --
> Vojtech Toman
> Consultant Software Engineer
> EMC | Information Intelligence Group
> vojtech.toman@emc.com
> http://developer.emc.com/xmltech
>
>
>> -----Original Message-----
>> From: xproc-dev-request@w3.org [mailto:xproc-dev-request@w3.org] On
>> Behalf Of Stefanie Haupt
>> Sent: Wednesday, February 09, 2011 2:24 PM
>> To: xproc-dev@w3.org
>> Subject: Re: Unzipping .bz2 ?
>>
>> Hi list,
>>
>> I've found the error in my attempt and thought I'd share:
>>
>> This works (there's still one thing that bothers me):
>> <?xml version="1.0" encoding="UTF-8"?>
>> <p:pipeline xmlns:p="http://www.w3.org/ns/xproc" version="1.0">
>>
>>   <p:exec command="/bin/bzip2" result-is-xml="false" args="-d -k
>> filename.xml.bz2"
>>     wrap-result-lines="true">
>>
>>     <p:input port="source">
>>       <p:empty/>
>>     </p:input>
>>
>>   </p:exec>
>>
>>   <p:identity/>
>> </p:pipeline>
>>
>> You might notice that the filename is hardcoded inside the arguments.
>> I have not been able to use bzip2 without doing this. I've tried with
>> <p:data href="filename.xml.bz2"/> instead of p:empty without luck.
>> Reading the filename from commandline using p:option and repacing the
>> string of args by "-d -k $filename" or "-d -k {$filename}" weren't
>> lucky either. If you know how to solve this, I'd be happy to hear!
>> Perhaps it's just a silly mistake I've made but I don't see the
>> answer, so any help is appreciated, thank you! Even if you tell me
>> simply: You can't! Many thanks in advance and
>> best regards
>> Stefanie
>>
>>
>>
>> On Wed, Feb 2, 2011 at 11:43 AM, Stefanie Haupt <st.haupt@gmail.com>
>> wrote:
>> > Hi Jostein,
>> >
>> > I did that, sorry should have mentioned it - it does not change the
>> error.
>> > I have the impression that the engine somehow chokes on bzip2/bunzip2
>> > (tried both variants) - I've never read a *module with no systemId*
>> > error message before and can't find somehting helpful by googling.
>> And
>> > the error message would be different, if the engine would not be able
>> > to access bzip2/bunzip2 at all.
>> >
>> > Kind Regards
>> > Stefanie
>> >
>> > On Wed, Feb 2, 2011 at 11:31 AM, Jostein Austvik Jacobsen
>> > <josteinaj@gmail.com> wrote:
>> >> Are you sure that the result of the p:exec is valid XML? You could
>> >> try result-is-xml="false" and see if that produces valid output...
>> >> Regards
>> >> Jostein
>> >>
>> >> 2011/2/2 Stefanie Haupt <st.haupt@gmail.com>
>> >>>
>> >>> Hello list,
>> >>>
>> >>> I'm trying to unzip some .bz2 file using XProc (using calabash
>> >>> 0.9.32). Since they are not handled by cx:unzip (the archive is
>> read
>> >>> as empty) I thought I'd write a p:exec step. But that fails with a
>> >>> fatal error. Can you tell me what's wrong?  I guess the most
>> >>> interesting line of the error message would be this:  *module with
>> no
>> >>> systemId*:1:java.io.IOException: Broken pipe, however, I've
>> included
>> >>> the complete pipe and error message below.
>> >>>
>> >>> Many thanks in advance and kind regards,
>> >>> Stefanie
>> >>>
>> >>> This is the pipeline:
>> >>> <?xml version="1.0" encoding="UTF-8"?>
>> >>> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
>> >>> xmlns:c="http://www.w3.org/ns/xproc-step"
>> >>>  xmlns:cx="http://xmlcalabash.com/ns/extensions" version="1.0">
>> >>>
>> >>>  <p:input port="source">
>> >>>    <p:data href="test.xml.bz2"/>
>> >>>  </p:input>
>> >>>
>> >>>  <p:exec command="/bin/bunzip2" source-is-xml="false" result-is-
>> xml="true"
>> >>>    wrap-result-lines="false" name="unzip">
>> >>>    <p:with-option name="args"
>> >>>      select="'--keep'" />
>> >>>  </p:exec>
>> >>>
>> >>>  <p:store href="test-unzipped.xml"/>
>> >>>
>> >>> </p:declare-step>
>> >>>
>> >>>
>> >>> Error-message:
>> >>> calabash --debug unzip.xpl
>> >>> 02.02.2011 10:08:40
>> com.xmlcalabash.util.DefaultXProcMessageListener info
>> >>> INFO: Running pipeline !1
>> >>> 02.02.2011 10:08:40
>> com.xmlcalabash.util.DefaultXProcMessageListener info
>> >>> INFO: Running exec unzip
>> >>> 02.02.2011 10:08:40
>> com.xmlcalabash.util.DefaultXProcMessageListener info
>> >>> INFO: unzip.xpl:10:44:Exec: /bin/bunzip2 --keep
>> >>> 02.02.2011 10:08:40
>> com.xmlcalabash.util.DefaultXProcMessageListener error
>> >>> SCHWERWIEGEND: *module with no systemId*:1:java.io.IOException:
>> Broken
>> >>> pipe
>> >>> 02.02.2011 10:08:40
>> com.xmlcalabash.util.DefaultXProcMessageListener error
>> >>> SCHWERWIEGEND: java.io.IOException: Broken pipe
>> >>> 02.02.2011 10:08:40 com.xmlcalabash.drivers.Main error
>> >>> SCHWERWIEGEND: Pipeline failed:
>> net.sf.saxon.s9api.SaxonApiException:
>> >>> java.io.IOException: Broken pipe
>> >>> 02.02.2011 10:08:40 com.xmlcalabash.drivers.Main error
>> >>> SCHWERWIEGEND: Underlying exception:
>> >>> net.sf.saxon.trans.XPathException: java.io.IOException: Broken pipe
>> >>> net.sf.saxon.s9api.SaxonApiException: java.io.IOException: Broken
>> pipe
>> >>>        at
>> net.sf.saxon.s9api.XQueryEvaluator.run(XQueryEvaluator.java:303)
>> >>>        at com.xmlcalabash.library.Exec.run(Unknown Source)
>> >>>        at com.xmlcalabash.runtime.XAtomicStep.run(Unknown Source)
>> >>>        at com.xmlcalabash.runtime.XPipeline.doRun(Unknown Source)
>> >>>        at com.xmlcalabash.runtime.XPipeline.run(Unknown Source)
>> >>>        at com.xmlcalabash.drivers.Main.run(Unknown Source)
>> >>>        at com.xmlcalabash.drivers.Main.main(Unknown Source)
>> >>> Caused by: net.sf.saxon.trans.XPathException: java.io.IOException:
>> Broken
>> >>> pipe
>> >>>        at
>> >>> net.sf.saxon.serialize.TEXTEmitter.characters(TEXTEmitter.java:101)
>> >>>        at
>> >>> net.sf.saxon.event.ProxyReceiver.characters(ProxyReceiver.java:186)
>> >>>        at
>> >>>
>> net.sf.saxon.event.ComplexContentOutputter.characters(ComplexContentOut
>> putter.java:165)
>> >>>        at
>> net.sf.saxon.tree.tiny.TinyTextImpl.copy(TinyTextImpl.java:76)
>> >>>        at
>> >>>
>> net.sf.saxon.event.ComplexContentOutputter.append(ComplexContentOutputt
>> er.java:521)
>> >>>        at net.sf.saxon.expr.Expression.process(Expression.java:503)
>> >>>        at
>> >>> net.sf.saxon.query.XQueryExpression.run(XQueryExpression.java:390)
>> >>>        at
>> net.sf.saxon.s9api.XQueryEvaluator.run(XQueryEvaluator.java:299)
>> >>>        ... 6 more
>> >>> Caused by: java.io.IOException: Broken pipe
>> >>>        at java.io.FileOutputStream.writeBytes(Native Method)
>> >>>        at java.io.FileOutputStream.write(FileOutputStream.java:297)
>> >>>        at
>> >>>
>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>> >>>        at
>> >>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
>> >>>        at
>> net.sf.saxon.serialize.UTF8Writer.write(UTF8Writer.java:286)
>> >>>        at
>> net.sf.saxon.serialize.UTF8Writer.write(UTF8Writer.java:253)
>> >>>        at
>> >>> net.sf.saxon.serialize.TEXTEmitter.characters(TEXTEmitter.java:99)
>> >>>        ... 13 more
>> >>>
>> >>>
>> >>> --
>> >>> Stefanie Haupt, M.A.
>> >>>
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Stefanie Haupt, M.A.
>> >
>>
>>
>>
>> --
>> Stefanie Haupt, M.A.
>>
>
>



-- 
Stefanie Haupt, M.A.
Received on Wednesday, 9 February 2011 14:31:51 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 February 2011 14:31:52 GMT