Archive Module -revival from John Lumley on 2014-02-04 (public-expath@w3.org from February 2014)

From: John Lumley <john@saxonica.com>
Date: Tue, 04 Feb 2014 10:52:03 +0000
To: EXPath ML <public-expath@w3.org>
CC: John Lumley <john@saxonica.com>
Message-ID: <52F0C653.4090904@saxonica.com>
Gentlefolk,
The Archive Module http://expath.org/spec/archive has been somewhat put 
on the back burner since the September draft while getting the Binary 
and File modules close to or at 1.0 status, but it's about time it got a 
little more attention.

Some discussion took place in November and December (with an unpublished 
Editor's draft) suggesting the following easy changes:

 1. Adding functions arch:to-files() and arch:from-files() to give
    single-call transfers between archive and file directory trees.
    [These are effectively the examples in the Sept. draft, with some
    correctons and made more complete. It has been suggested that a
    'target-path' argument should be added to arch:to-files().]
 2. Improving the actions dealing with empty 'directory' entries.
 3. Adding some further convenience functions for content, such as
    arch:text() which is effectively bin:encode-string(), or arch:xml()
    which is a compound bin:encode-string(fn:serialize()). [Similar for
    arch:html() with different serialization controls.] Both of these
    would require dependency on the Binary module, unless someone wishes
    to rewrite their own implementation.

The first two have already been altered on the Editor's draft. Views on 
the third are welcome. Florent suggested examples:

        arch:create(
            ('mimetype', 'META-INF/container.xml', ...),
            (arch:text($mimetype), arch:xml($container), ...))

    The ideal solution, from an API and usability point of view, would be something like:

         arch:create((
            arch:text-entry('mimetype', $mimetype),
            arch:xml-entry('META-INF/container.xml', $container),
            ...))

    and this last example becomes "easy" to represent if arch:create() accepts maps.

Whilst the first form for arch:create() works easily with the current 
'non-map' archive mechanism, of two parallel sequences, the second does 
not. It could conceivably work for a single argument form of 
arch:create($in as /item()/*), taking the members of $in by pairs, 
assumed to be (/xs:string,xs:base64Binary/)*, andarch:/xxx/-entry() 
producing such a pair. Not the most elegant, but certainly coherent. If 
you really wanted the first argument could be treated polymorphically - 
if you want 'per entry' control on properties, an element structure 
could be used and implementations make type-directed choices - but I'd 
rather not open that can of worms.

Excluding the use of maps, other (minor) issues that are outstanding 
include:

 1. Whether options for an archive should be read or set through
    attributes on an element, or child elements, viz <arch:options
    format="zip"> or <arch:options><arch:format>zip</ .. Currently
    <arch:entry> uses attributes. We should perhaps try to be consistent
    and establish a coherent policy. Personally I much prefer attribute
    mechanisms (they're textually denser!), though structured options,
    such as character maps (!) will require elements, and the options
    for fn:serialize() are now in the form of elements e.g.
    <output:serialization-parameters><output:method value="html"> -
    though they still use a @value attribute!


But the largest issue is how to address the use of maps, which is by far 
the most coherent mechanism, enabling content and all properties to be 
kept together.

 1. One objection might have been that the order of entries in a map is
    undefined (i.e. the return from map:keys()) - this can be
    accomodated by consistent use of a 'position' property attached to
    each entry - written when reading from an archive,  used for order
    (if present) when creating an archive.
 2. How does the 'element' - based mechanism (used for describing
    options and properties) co-exist with a map based one? My initial
    suggestion is that they don't - map-based Archive is in a totally
    different namespace and uses (almost) ONLY maps. Apart from some
    probable sharing of implementation code, and being joined at the
    specification, both have totally separate function catalogs,
    examples and test suites. [For example arch:to-files() and
    archM:to-files() would have different internal operational
    definitions (actually in terms of their use of entries()) though
    their external function would be identical.]
 3. Maps require some support for XPath3.0 - at the absolute minimum the
    functions map:entry(), map:new(), map:keys() and map:get()from
    XSLT3.0 in the absence of the map{} syntax of XPath3.0 - how many
    implementations will have this?
 4. I've used the prefix archM: in the spec for all the map-based stuff,
    but it's a little awkward in reading. Of course you're free to use
    (almost) any prefix you wish in code, and could re-use arch:, but in
    practice we tend to stick to the spec.conventional prefix to aid
    understanding. Any suggestions for a better differentiation between
    arch: and archM: ?

My suggestion is that we aim in version 1.0 to support both 
element-structure and map-based libraries, but make it clear that 
additional functionality (1.1 etc...) will be focussed almost 
exclusively on using maps.


Reactions are more than welcome.
-- 
*John Lumley* MA PhD CEng FIEE
john@saxonica.com <mailto:john@saxonica.com>
on behalf of Saxonica Ltd
Received on Tuesday, 4 February 2014 10:52:23 UTC