- From: Alex Muir <alex.g.muir@gmail.com>
- Date: Wed, 21 Apr 2010 15:12:02 +0000
- To: Toman_Vojtech@emc.com
- Cc: xproc-dev@w3.org
- Message-ID: <i2p88b533b91004210812l591d72dbt3c45c2ae3c93509@mail.gmail.com>
Hi, I was having trouble with the unzip function as well. I have an xproc process not using zip loading html files via the unparsed-text function in xslt to convert the html file into xml to process further. I don't want to use tag-soup or tidy to clean the html to xml and rather analyze the html content and create my own interpretation of an xml representation of the data. I wanted to then use ziped html files to save space although I wasn't able tot get it working. I was thinking that I would be able to unzip the HTML and do something similar to the unparsed-text($input_uri, 'UTF-8') function to get the data into xml without using the tag soup/tidy. Is there a means to do that in xproc? Regards Alex On Wed, Apr 21, 2010 at 1:30 PM, <Toman_Vojtech@emc.com> wrote: > Well, if you look closer at the specification of pxp:unzip ( > http://exproc.org/proposed/steps/other.html), this is actually the > 'correct' behavior. Only if the content type is an XML content type, the > data is returned without base64 encoding. All other content types (including > text types) always result in base64 encoded data. I actually think this is a > bug in the EXProc specification and that the result of pxp:unzip should be > made consistent with what p:data does (i.e. not base64 encoding text content > types) > > > > Regards, > > Vojtech > > > > *From:* Christopher Ball [mailto:christopher.r.ball@gmail.com] > *Sent:* Wednesday, April 21, 2010 3:22 PM > *To:* Toman, Vojtech; xproc-dev@w3.org > > *Subject:* RE: Missing something basic . . ? > > > > Tom, > > > > Thanks for the suggestion. > > > > Unfortunately, I forgot to mention in my original email that I had tried > that permutation as well . . . with out getting the desired effect =( > > > > With the single quotes, the content-type gets paused through but still > seems to be getting ignored and I end up with an output file of the > following nature: > > > > <!-- Output Snippet --> > > <c:data xmlns:c="http://www.w3.org/ns/xproc-step" name= > "1stFranklinFinancialCorp_CIK0000038723.txt" content-type="text/plain"> > LS0tLS1CRUdJTiBQUklWQUNZLUVOSEFOQ0VEIE1FU1NBR0UtLS0tLQ0KUHJvYy1UeXBlOiAyMDAx > LE1JQy1DTEVBUg0KT3JpZ2luYXRvci1OYW1lOiB3ZWJtYXN0ZXJAd3d3LnNlYy5nb3YNCk9yaWdp > . . . </c:data> > > > > Dare I say this is a bug? If so, I suppose a work around would be to cast > back from base64 to string using an xPath function . . ? > > > > Thoughts? > > > > Christopher > > > ------------------------------ > > *From:* xproc-dev-request@w3.org [mailto:xproc-dev-request@w3.org] *On > Behalf Of *Toman_Vojtech@emc.com > *Sent:* Wednesday, April 21, 2010 3:27 AM > *To:* xproc-dev@w3.org > *Subject:* RE: Missing something basic . . ? > > > > Christopher, > > > > Try the following: > > > > <cx:unzip> > > … > > <p:with-option name="content-type" select="'text/plain'"/> > > … > > </cx:unzip> > > > > (Single quotes around the text/plain value so that it is treated as a > string and not as an XPath expression) > > > > That might help. > > > > Vojtech > > > > > > *From:* xproc-dev-request@w3.org [mailto:xproc-dev-request@w3.org] *On > Behalf Of *Christopher Ball > *Sent:* Wednesday, April 21, 2010 3:20 AM > *To:* xproc-dev@w3.org > *Subject:* Missing something basic . . ? > > > > Hello, > > > > I am trying to process some zipped text files in xproc (leveraging a > Calabash extension), but I am getting tripped up by base64 encoding. > > > > My first draft of the xproc is below. Unfortunately, the content-type > option on cx:unzip seems to be getting ignored and I end up with an output > file of the following nature: > > > > <!-- Output Snippet --> > > <c:data xmlns:c="http://www.w3.org/ns/xproc-step" name="InputFile1.txt" > content-type="">LS0tLS1CRUdJTiBQUklWQUNZLUVOSEFOQ0VEIE1FU1NBR0UtLS0tLQ0KUHJvYy1UeXBlOiAyMDAx > . . . </c:data> > > > > I am I missing the obvious . . . or trying to do the impossible? > > > > Most grateful for any feedback, > > > > Christopher > > > > > > <!-- xProc File --> > > <?xml version="1.0" encoding="UTF-8"?> > > <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" > > xmlns:cx="http://xmlcalabash.com/ns/extensions" > > xmlns:c="http://www.w3.org/ns/xproc-step" > > xmlns:html="http://www.w3.org/1999/xhtml" > > name="aMeaninglessName" > > version="1.0" > > > > > <p:input port="source"> > > <p:empty/> > > </p:input> > > > > <p:declare-step type="cx:unzip" version="1.0"> > > <p:output port="result"/> > > <p:option name="href" required="true"/> > > <p:option name="file"/> > > <p:option name="content-type"/> > > </p:declare-step> > > > > <p:variable name="startingFileNumber" select="'1'"/> > > <p:variable name="endingFileNumber" select="'1'"/> > > > > <p:variable name="source-folder" select="'../zippedFiles/'"/> > > > > <p:directory-list> > > <p:with-option name="path" select="$source-folder"> > > <p:empty/> > > </p:with-option> > > </p:directory-list> > > > > <p:for-each name="ZipedHTMLFile"> > > <p:iteration-source > > select="//c:file[position() ge number($startingFileNumber) and > position() le number($endingFileNumber)]"/> > > > > <p:variable name="filename" select="c:file/@name"/> > > > > <!-- Load from Zip file --> > > <cx:unzip name="get-XML"> > > <p:with-option name="href" > select="concat($source-folder,$filename)"/> > > <p:with-option name="file" > select="replace($filename,'.zip','.txt')"/> > > <p:with-option name="content-type" select="text/plain"/> > > </cx:unzip> > > > > <p:store href="../output/processed.xml" name="store"/> > > > > </p:for-each> > > > > </p:declare-step> > -- Alex https://sites.google.com/a/utg.edu.gm/alex Some Good Music http://sites.google.com/site/greigconteh/
Received on Wednesday, 21 April 2010 15:20:51 UTC