RE: new draft of Zip/Unzip Note from Toman, Vojtech on 2013-08-05 (public-xml-processing-model-wg@w3.org from August 2013)

From: Toman, Vojtech <vojtech.toman@emc.com>
Date: Mon, 5 Aug 2013 03:59:48 -0400
To: XProc WG <public-xml-processing-model-wg@w3.org>
Message-ID: <F3C7EBECE80AC346BE4D1C5A9BB4A41F2F6AD291FB@MX11A.corp.emc.com>

Good work. Here are some of my immediate comments:

3. p:zip

- "update: will add files to the existing archive. When a manifest is provided will add non-existing new files, only if it has been modified more recently than the version already in the zip archive. This will also add files to an archive that do not exist previously."

Personally, I am not comfortable with the "if it has been modified more recently than the version already in the zip archive" bit. Does it mean that the processor will need to keep timestamps for the data flowing through the pipeline?

- If I understand it correctly, there is no "manifest" input port anymore in p:zip; the manifest (in the form of c:* documents) can be passed through the "source" input port.
  - I would like to see an example of how manifest entries can be matched with the input documents (for example, to apply different serialization options to different documents)
  - The "regular" and "c:*" documents can be mixed freely on the "source" port. Does it mean that the p:zip step has to first collect all input documents (and cache them) until it has collected the complete manifest information and knows what to do with the input documents?
  - How can I put a c:* document into the zip archive?
  - Can I just p:pipe some dynamically created data into the ZIP without a corresponding manifest entry?
  - I think the description of the manifest needs to be clearer.

- Various "It is an error if ..." - which error?

- The description of the step should say something like: "The p:zip operates on a ZIP archive. it returns a manifest document describing..." It took me quite some time to see what the step actually returns.

-----

4. p:unzip

- The "result" output port is non-sequential. Is that correct? What if I pass a c:archive input document to extract multiple files? (Is that possible?)

- "it is a dynamic error" - which error?

-----

Example 5.13:

- How do you imagine <p:document href="file:///var/html/mydoczip.zip"/> to work?

-----

I think we need to say more about base URI and name handling:
  - If you put XML data into a ZIP, what will be the properties (name, ...) of the created ZIP entry?
  - If you extract an XML document from the ZIP, what will be its base URI?


Regards,
Vojtech

--
Vojtech Toman
Consultant Software Engineer
EMC | Information Intelligence Group
vojtech.toman@emc.com
http://developer.emc.com/xmltech

> -----Original Message-----
> From: James Fuller [mailto:jim@webcomposite.com]
> Sent: Sunday, August 04, 2013 6:55 PM
> To: Norman Walsh
> Cc: XProc WG
> Subject: new draft of Zip/Unzip Note
> 
> Inspired by Norm's recent work,
> 
> was able to get to a new draft for the zip/unzip steps
> 
>    http://www.w3.org/XML/XProc/docs/xproc-zip_unzip.html
> 
> Its rough around the edges but in a good place for discussion to start.
> 
> Jim Fuller
>

Received on Monday, 5 August 2013 08:00:37 UTC