Re: Reminder: Archive Module

On 21/10/2013 12:54, Christian Grün wrote:
> Hi John,
>
> thanks for editing the Archive Module.
>
> • As indicated before, I think that a convenience function for
> extracting ZIP archives to disk would be beneficial. I see three
> reasons for this: Extracting files is one of the most frequent
> operations done with archives. Next, a pure XPath or XQuery solution
> is, by nature, pretty cumbersome and not intuitive. Last but not
> least, we should provide a solution for extracting very large files,
> as it’s pretty tricky to rewrite the example in 3.3, and related ones,
> for streaming IO. However, I agree that is easy to find SoC arguments
> against providing such a function.
I've added a generalization of the example of 3.3 into a function that 
can either be provided as an XSLT package (which assumes the use of 
EXPathFile), or provided as a built-in function with the same name, for 
which an entry appears in the function catalog:

    <xsl:function name="arch:extract-to-files">
         <xsl:param name="archive" as="xs:base64Binary"/>
         <xsl:variable name="entries" select="arch:entries($archive)"/>
         <xsl:variable name="dirs" select="$entries[ends-with(.,'/')]"/>
         <xsl:variable name="required.dirs"
             select="distinct-values(for $r in ($entries except $dirs)
                       return
    replace($r,'/[^/]+$','/'))[ends-with(.,'/')]"/>
         <xsl:sequence
             select="for $d in distinct-values(($required.dirs,$dirs))
                     return file:create-dir(replace($d,'/$',''))"/>
         <xsl:sequence
             select="for $f in ($entries except $dirs)
                     return
    file:write-binary($f,arch:extract-binary($archive,$f))"/>
    </xsl:function>

It is then implementation dependent whether the extension might be OK in 
streaming. Perhaps for sake of completeness we might define the inverse 
- arch:archive-from-files(/$files as/ /xs:string*/, /$options/) as 
/xs:base64Binary/ which descends the file trees to garner data?
>
> • Another open issue is the handling of directories in archives. I
> don’t have an easy answer for that, but we still need to find some way
> to create empty directories via the Archive Module.
Since entries are named with the solidus as the separator, then any 
entry whose name ends with '/' could be taken to be a directory, and any 
(usually empty) content, either in the parallel content argument or in 
the appropriate 'content' map entry is ignored, and an empty directory 
added to the archive?
>
> • Some additional errors could be added for handling unsupported
> archive formats or algorithms, or entry descriptors (unless we want to
> generally use FORG0006 for unknown function arguments)
More on this as stuff develops. The big issue for me at present is how 
we distinguish between map and element-based forms.
>
> Hope this helps,
> Christian
>


-- 
*John Lumley* MA PhD CEng FIEE
john@saxonica.com <mailto:john@saxonica.com>
on behalf of Saxonica Ltd

Received on Wednesday, 13 November 2013 12:13:17 UTC