- From: Christopher Ball <christopher.r.ball@gmail.com>
- Date: Wed, 21 Apr 2010 09:54:10 -0400
- To: <Toman_Vojtech@emc.com>, <xproc-dev@w3.org>
- Message-ID: <007901cae15a$1f914200$0301a8c0@cgifederal.com>
Tom, >From the perspective of a novice with the language, I would have to strongly agree with your statement, "I actually think this is a bug in the EXProc specification". Greater wisdom may prevail, but to me one of the great values of XML technology is in helping to "get the unstructured structured" which often means starting out with non XML such as HTML or just plain text and then transforming it into well structured content compliant with an XSD. Is there any centralized place I can cast my vote on the matter? In the mean time, I need a work around - so I suppose I will try an xPath Function or if need be writing my own extension function . . . suggestions welcome. Christopher _____ From: xproc-dev-request@w3.org [mailto:xproc-dev-request@w3.org] On Behalf Of Toman_Vojtech@emc.com Sent: Wednesday, April 21, 2010 9:30 AM To: xproc-dev@w3.org Subject: RE: Missing something basic . . ? Well, if you look closer at the specification of pxp:unzip (http://exproc.org/proposed/steps/other.html), this is actually the 'correct' behavior. Only if the content type is an XML content type, the data is returned without base64 encoding. All other content types (including text types) always result in base64 encoded data. I actually think this is a bug in the EXProc specification and that the result of pxp:unzip should be made consistent with what p:data does (i.e. not base64 encoding text content types) Regards, Vojtech From: Christopher Ball [mailto:christopher.r.ball@gmail.com] Sent: Wednesday, April 21, 2010 3:22 PM To: Toman, Vojtech; xproc-dev@w3.org Subject: RE: Missing something basic . . ? Tom, Thanks for the suggestion. Unfortunately, I forgot to mention in my original email that I had tried that permutation as well . . . with out getting the desired effect =( With the single quotes, the content-type gets paused through but still seems to be getting ignored and I end up with an output file of the following nature: <!-- Output Snippet --> <c:data xmlns:c="http://www.w3.org/ns/xproc-step" name="1stFranklinFinancialCorp_CIK0000038723.txt" content-type="text/plain">LS0tLS1CRUdJTiBQUklWQUNZLUVOSEFOQ0VEIE1FU1NBR0UtLS 0tLQ0KUHJvYy1UeXBlOiAyMDAx LE1JQy1DTEVBUg0KT3JpZ2luYXRvci1OYW1lOiB3ZWJtYXN0ZXJAd3d3LnNlYy5nb3YNCk9yaWdp . . . </c:data> Dare I say this is a bug? If so, I suppose a work around would be to cast back from base64 to string using an xPath function . . ? Thoughts? Christopher _____ From: xproc-dev-request@w3.org [mailto:xproc-dev-request@w3.org] On Behalf Of Toman_Vojtech@emc.com Sent: Wednesday, April 21, 2010 3:27 AM To: xproc-dev@w3.org Subject: RE: Missing something basic . . ? Christopher, Try the following: <cx:unzip> . <p:with-option name="content-type" select="'text/plain'"/> . </cx:unzip> (Single quotes around the text/plain value so that it is treated as a string and not as an XPath expression) That might help. Vojtech From: xproc-dev-request@w3.org [mailto:xproc-dev-request@w3.org] On Behalf Of Christopher Ball Sent: Wednesday, April 21, 2010 3:20 AM To: xproc-dev@w3.org Subject: Missing something basic . . ? Hello, I am trying to process some zipped text files in xproc (leveraging a Calabash extension), but I am getting tripped up by base64 encoding. My first draft of the xproc is below. Unfortunately, the content-type option on cx:unzip seems to be getting ignored and I end up with an output file of the following nature: <!-- Output Snippet --> <c:data xmlns:c="http://www.w3.org/ns/xproc-step" name="InputFile1.txt" content-type="">LS0tLS1CRUdJTiBQUklWQUNZLUVOSEFOQ0VEIE1FU1NBR0UtLS0tLQ0KUHJv Yy1UeXBlOiAyMDAx . . . </c:data> I am I missing the obvious . . . or trying to do the impossible? Most grateful for any feedback, Christopher <!-- xProc File --> <?xml version="1.0" encoding="UTF-8"?> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:cx="http://xmlcalabash.com/ns/extensions" xmlns:c="http://www.w3.org/ns/xproc-step" xmlns:html="http://www.w3.org/1999/xhtml" name="aMeaninglessName" version="1.0" > <p:input port="source"> <p:empty/> </p:input> <p:declare-step type="cx:unzip" version="1.0"> <p:output port="result"/> <p:option name="href" required="true"/> <p:option name="file"/> <p:option name="content-type"/> </p:declare-step> <p:variable name="startingFileNumber" select="'1'"/> <p:variable name="endingFileNumber" select="'1'"/> <p:variable name="source-folder" select="'../zippedFiles/'"/> <p:directory-list> <p:with-option name="path" select="$source-folder"> <p:empty/> </p:with-option> </p:directory-list> <p:for-each name="ZipedHTMLFile"> <p:iteration-source select="//c:file[position() ge number($startingFileNumber) and position() le number($endingFileNumber)]"/> <p:variable name="filename" select="c:file/@name"/> <!-- Load from Zip file --> <cx:unzip name="get-XML"> <p:with-option name="href" select="concat($source-folder,$filename)"/> <p:with-option name="file" select="replace($filename,'.zip','.txt')"/> <p:with-option name="content-type" select="text/plain"/> </cx:unzip> <p:store href="../output/processed.xml" name="store"/> </p:for-each> </p:declare-step>
Received on Wednesday, 21 April 2010 13:57:22 UTC