Zip/Unzip - the Minimalist Version for EPUB

Here's a minimalist version that can address the needs of EPUB

1. We have a zip step that just zip files and directories with control
over compression.

<p:declare-step type="p:zip">
     <p:input port="source" primary="true"/>
     <p:output port="result" primary="true"/>
     <p:option name="target"/>
     <p:option name="brief" select="'true'"/>
</p:declare-step>

The input is a c:archive element and the output is a c:archive.  If
the 'brief' element is true, only the c:archive element is output.
Otherwise, the full list of every entry is provided on the output.

If the 'target' option is not specified, the c:archive element must
have an 'href' attribute.

2. We have an unzip step that:

     * can list a manifest of what is in the zip file
     * extract the zip locally (e.g. on disk) with the location
specified via an option.

<p:declare-step type="p:unzip">
     <p:output port="result" primary="true"/>
     <p:option name="href" required="true"/>
     <p:option name="target"/>
    <p:option name="brief" select="'true'"/>
     <p:option name="manifest-only" select="'true'"/>
</p:declare-step>

The archive is specified via the 'href' option.  The result is
extracted to the target location specified by the 'target' option.  If
that option is not specified, the target is generated from the source.

The output of the step is a c:directory element.  If 'brief' is true,
only the directory is listed.  Otherwise, every file and subdirectory
is listed in the output.

Alternatively, if the 'manifest-only' is true, the output is a
c:archive element listing all the entries in the zip file.  The
'target' and 'brief' options are ignored when 'manifest-only' is true.

3. The manifest uses c:entry elements instead of files:

element c:archive {
  & attribute href { text }?,
  & attribute base { text}?,
  c:file*
}

element c:entry {
  & attribute path { text },
  & attribute modified { text },
  & attribute size { text },
  & attribute comment { text }?,
  & attribute compressed { "true" | "false" }?,
  & attribute directory { "true" | "false" }?
}

4. A new step p:zip-extract extracts a single entry from a zip file as
the output of the step:

<p:declare-step type="p:zip-extract">
     <p:input port="source" primary="true"/>
     <p:output port="result" primary="true"/>
     <p:option name="href" required="true"/>
</p:declare-step>

The input is expected to be a single c:entry element.

We could consider allowing a c:archive element to extract multiple
files.  We would need to provide a way to designate whether the
results are outputs or written to local storage.

We could consider allowing a 'target' option so that the entries are
extracted to local storage.

5. In the future, a p:zip-modify step can handle updating or deleting
entries as well as merging zip files.

6. In the future, we could consider allowing directory entries to have
inclusion/exclusion patterns for handling file inclusion.  This would
allow one to zip only files of certain extensions within a directory.


Use cases:

1. Creating an EPUB file:

<p:zip>
   <p:input port="source" brief="false">
      <p:inline>
           <c:archive href="book.epub" base="book">
                <c:entry path="mimetype" compressed="false"/>
                <c:entry path="META-INF" directory="true"/>
                <c:entry path="content" directory="true"/>
           </c:archive>
      </p:inline>
    </p:input>
<p:zip>

 produces (for example):

    <c:archive href="book.epub">
         <c:entry path="mime type" compressed="false"/>
         <c:entry path="META-INF/" directory="true"/>
         <c:entry path="META-INF/container.xml" compressed="true"/>
         <c:entry path="content/" directory="true"/>
         <c:entry path="content/book.opf" compressed="true"/>
         <c:entry path="content/book.ncx" compressed="true"/>
         <c:entry path="content/book.xhtml" compressed="true"/>
   </c:archive>

2. Unpack an EPUB:

   <p:unzip href="book.epub" target="book" brief="false">

   produces (for example):

   <c:directory href="book/">
         <c:file href="mimetype"/>
         <c:directory href="book/META-INF/">
            <c:file path="book/META-INF/container.xml"/>
         </c:directory>
         <c:directory href="book/content/">
             <c:file href="book/content/book.opf"/>
             <c:file href="book/content/book.ncx"/>
             <c:file href="book/content/book.xhtml"/>
         </c:directory>
   </c:archive>

3. Getting content from an EPUB file:

   <p:zip-extract href="book.epub">
      <p:input port="source">
          <p:inline>
              <c:entry path="content/book.xhtml"/>
          </p:inline>
       </p:input>
   </p:zip-extract>

   produces (for example):

     <html xmlns="http://www.w3.org/1999/xhtml"> ... </html>

-- 
--Alex Milowski
"The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered."

Bertrand Russell in a footnote of Principles of Mathematics

Received on Tuesday, 3 June 2014 21:14:22 UTC