Re: archive module from Liam R E Quin on 2012-07-13 (public-expath@w3.org from July 2012)

From: Liam R E Quin <liam@w3.org>
Date: Fri, 13 Jul 2012 11:50:19 -0400
To: Matthias Brantner <matthias.brantner@28msec.com>
Cc: public-expath@w3.org, expath@googlegroups.com
Message-ID: <1342194619.6124.113.camel@localhost.localdomain>

On Thu, 2012-06-28 at 10:26 -0700, Matthias Brantner wrote:

> There are still a couple of questions that need to be answered and it
> would be great to get your opinion:
> 
> We would like to make the support for the archive format ZIP with
> compression algorithms STORE and DEFLATE mandatory. All other formats
> or compression algorithms will probably have to be implementation
> dependent.  For example, Zorba's implementation is based on libarchive
> and allows for creating compressed tar archives. BaseX's
> implementation is in Java and doesn't allow for creating a tar archive
> but provides a way to only compress a single entry with gzip. Does
> this make sense?

Seems to me there are two separate things:
(1) load an archive, should support tar and zip
(2) create an archive, optional, should be able to create at least one
of tar and zip
(3) maybe, uncompress a gzip'd file
(4) maybe, compress a file with gzip
> 
> Many archive formats or compression algorithms can be parameterized
> with various different options. Hence, those options need to be passed
> in an implementation dependent way. We are not sure how those
> parameters would look like, yet.
maps, probably! :)

> 
> With the current interface it's not possible to extract all
> information out an archive in a streaming fashion. Specifically, there
> is one function which returns the metadata of all the entries (e.g.
> their names) and another set of functions that provide ways to extract
> a particular set of entries from the archive given the names of the
> entries. In order to do this, one needs to be able to seek back and
> forth in the archive. For example, this might not be possible if the
> archive comes from an HTTP resource which is too big to be
> materialized.

Yes. Not all archive formats can be unpacked in a streaming way either -
some (I think zip is one?) have the table of contents at the end of the
file.

>  There are several ways to return meta data and data at the same time
> but non of them seems really appealing. For example, there could be
> one function that returns a heterogeneous sequence alternating meta
> data and data but the result might be hard to process in XQuery.

You could represent an item as a function that takes one of three
arguments: get-info, get-data, get-next-function, where
get-next-function returns a new function...

Then it's fairly easy to process recursively.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

Received on Friday, 13 July 2012 15:50:45 UTC