Some notes about pxp:zip and pxp:unzip

> * [10]A-206-02: Jim to start drafting a note for p:zip/p:unzip

Perhaps Jim has already run into this, but I thought I should bring up some issues (those that I remember) with implementing the current EXProc pxp:zip/pxp:unzip steps.

- What about source files that are not included in the pxp:zip manifest? Is that an error or do they end up in the ZIP archive under their original base URI?
- Serialization. At the moment, pxp:zip does not allow to specify how XML documents are serialized in the ZIP archive. I ended up with adding serialization options to pxp:zip which are applied to each XML file and are therefore archive-global. It might be useful, though, to be able to specify different serialization options per file - but that would probably require putting the serialization options into the pxp:zip manifest somehow.
- Not sure about the compression level names "smallest" | "fastest" | "default" | "huffman" | "none". They are a direct lift from the Java API. Plus, the "huffman" constant is not a compression level, but a compression strategy. I think it should not be in the list.
- The pxp:zip step returns a c:zipfile representation of the ZIP archive on the "result" port. While I understand that this might be useful, it is not consistent with existing standard steps that write output to an external location (p:store, p:xsl-formatter) and that return a URI reference to the external data. 

- I think for non-XML data, the step should behave as p:data or p:http-request. Right now, the pxp:unzip spec says that: "If the content-type specified is not an XML content type, the file is base64 encoded and returned in a single c:data element." This obviously does not match the behavior of p:data wrt text media types. The pxp:unzip step also does not insert the "content-type" and "encoding" attributes on the c:data wrapper.
- What happens if the file specified through the "file" option is not found in the archive (I assume a dynamic error)?


Vojtech Toman
Consultant Software Engineer
EMC | Information Intelligence Group

Received on Thursday, 19 April 2012 10:17:49 UTC