W3C home > Mailing lists > Public > xproc-dev@w3.org > April 2010

RE: Missing something basic . . ?

From: Christopher Ball <christopher.r.ball@gmail.com>
Date: Wed, 21 Apr 2010 09:54:10 -0400
To: <Toman_Vojtech@emc.com>, <xproc-dev@w3.org>
Message-ID: <007901cae15a$1f914200$0301a8c0@cgifederal.com>
Tom,

 

>From the perspective of a novice with the language, I would have to strongly
agree with your statement, "I actually think this is a bug in the EXProc
specification".

 

Greater wisdom may prevail, but to me one of the great values of XML
technology is in helping to "get the unstructured structured" which often
means starting out with non XML such as HTML or just plain text and then
transforming it into well structured content compliant with an XSD. 

 

Is there any centralized place I can cast my vote on the matter?

 

In the mean time, I need a work around - so I suppose I will try an xPath
Function or if need be writing my own extension function . . . suggestions
welcome.

 

Christopher

 

  _____  

From: xproc-dev-request@w3.org [mailto:xproc-dev-request@w3.org] On Behalf
Of Toman_Vojtech@emc.com
Sent: Wednesday, April 21, 2010 9:30 AM
To: xproc-dev@w3.org
Subject: RE: Missing something basic . . ?

 

Well, if you look closer at the specification of pxp:unzip
(http://exproc.org/proposed/steps/other.html), this is actually the
'correct' behavior. Only if the content type is an XML content type, the
data is returned without base64 encoding. All other content types (including
text types) always result in base64 encoded data. I actually think this is a
bug in the EXProc specification and that the result of pxp:unzip should be
made consistent with what p:data does (i.e. not base64 encoding text content
types)

 

Regards,

Vojtech

 

From: Christopher Ball [mailto:christopher.r.ball@gmail.com] 
Sent: Wednesday, April 21, 2010 3:22 PM
To: Toman, Vojtech; xproc-dev@w3.org
Subject: RE: Missing something basic . . ?

 

Tom,

 

Thanks for the suggestion.

 

Unfortunately, I forgot to mention in my original email that I had tried
that permutation as well . . . with out getting the desired effect =(

 

With the single quotes, the content-type gets paused through but still seems
to be getting ignored and I end up with an output file of the following
nature:

 

<!-- Output Snippet -->

<c:data xmlns:c="http://www.w3.org/ns/xproc-step"
name="1stFranklinFinancialCorp_CIK0000038723.txt"
content-type="text/plain">LS0tLS1CRUdJTiBQUklWQUNZLUVOSEFOQ0VEIE1FU1NBR0UtLS
0tLQ0KUHJvYy1UeXBlOiAyMDAx
LE1JQy1DTEVBUg0KT3JpZ2luYXRvci1OYW1lOiB3ZWJtYXN0ZXJAd3d3LnNlYy5nb3YNCk9yaWdp
. . . </c:data>

 

Dare I say this is a bug? If so, I suppose a work around would be to cast
back from base64 to string using an xPath function . . ?

 

Thoughts?

 

Christopher

 

  _____  

From: xproc-dev-request@w3.org [mailto:xproc-dev-request@w3.org] On Behalf
Of Toman_Vojtech@emc.com
Sent: Wednesday, April 21, 2010 3:27 AM
To: xproc-dev@w3.org
Subject: RE: Missing something basic . . ?

 

Christopher,

 

Try the following:

 

  <cx:unzip>

    .

    <p:with-option name="content-type" select="'text/plain'"/>

    .

  </cx:unzip>

 

(Single quotes around the text/plain value so that it is treated as a string
and not as an XPath expression)

 

That might help.

 

Vojtech

 

 

From: xproc-dev-request@w3.org [mailto:xproc-dev-request@w3.org] On Behalf
Of Christopher Ball
Sent: Wednesday, April 21, 2010 3:20 AM
To: xproc-dev@w3.org
Subject: Missing something basic . . ?

 

Hello,

 

I am trying to process some zipped text files in xproc (leveraging a
Calabash extension), but I am getting tripped up by base64 encoding.

 

My first draft of the xproc is below. Unfortunately, the content-type option
on cx:unzip seems to be getting ignored and I end up with an output file of
the following nature:

 

<!-- Output Snippet -->

<c:data xmlns:c="http://www.w3.org/ns/xproc-step" name="InputFile1.txt"
content-type="">LS0tLS1CRUdJTiBQUklWQUNZLUVOSEFOQ0VEIE1FU1NBR0UtLS0tLQ0KUHJv
Yy1UeXBlOiAyMDAx . . . </c:data>

 

I am I missing the obvious . . . or trying to do the impossible?

 

Most grateful for any feedback,

 

Christopher

 

 

<!-- xProc File -->

<?xml version="1.0" encoding="UTF-8"?>

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" 

                xmlns:cx="http://xmlcalabash.com/ns/extensions" 

                xmlns:c="http://www.w3.org/ns/xproc-step" 

                xmlns:html="http://www.w3.org/1999/xhtml"

                name="aMeaninglessName"                 

                version="1.0" >

    

    <p:input port="source">

        <p:empty/>

    </p:input>

    

    <p:declare-step type="cx:unzip" version="1.0">

        <p:output port="result"/>

        <p:option name="href" required="true"/>

        <p:option name="file"/>

        <p:option name="content-type"/>

    </p:declare-step>

 

    <p:variable name="startingFileNumber" select="'1'"/>

    <p:variable name="endingFileNumber" select="'1'"/>

                                                        

    <p:variable name="source-folder" select="'../zippedFiles/'"/>

    

    <p:directory-list>

        <p:with-option name="path" select="$source-folder">

            <p:empty/>

        </p:with-option>

    </p:directory-list>

 

    <p:for-each name="ZipedHTMLFile">

        <p:iteration-source

            select="//c:file[position() ge number($startingFileNumber) and
position() le number($endingFileNumber)]"/>

        

        <p:variable name="filename" select="c:file/@name"/>

 

        <!-- Load from Zip file -->

        <cx:unzip name="get-XML">

            <p:with-option name="href"
select="concat($source-folder,$filename)"/>

            <p:with-option name="file"
select="replace($filename,'.zip','.txt')"/>

            <p:with-option name="content-type" select="text/plain"/>

        </cx:unzip>

 

        <p:store href="../output/processed.xml" name="store"/>

 

    </p:for-each>

 

</p:declare-step>
Received on Wednesday, 21 April 2010 13:57:22 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 21 April 2010 13:57:22 GMT