Re: [AudioTF] Agenda 2018-12-14

On Tue, 11 Dec 2018 at 12:14, Laurent Le Meur <laurent.lemeur@edrlab.org>
wrote:
>
> Reading apps use common zip libraries for their system, and look for the
mimetype file wherever it is. Up to developers who developed differently to
raise their hand.
> Logically, this mimetype-first constraint has been crafted for streaming
usage of EPUB (where the reader opens the zip file and get sequential
chunks out of it). It doubt this is a standard way of handling EPUB files.

I am not aware of any reading system implementation that makes use of (i.e.
checks for) the initial "mimetype" file in EPUB containers. They probably
exist, I am just not aware of them (leaving aside validation tools, of
course ;)

More importantly, here are some technical details about ZIP: the "central
directory" of a zip archive lists and describes the files that are
stored/compressed within the container. This data structure is located at
the end of the stream of binary data, in a predictable / discoverable
location.

I implemented support for "remote EPUBs" on at least two separate occasions
(if my memory serves me well), relying on HTTP 1.1 "partial requests" to
fetch arbitrary byte ranges from "packaged / packed" (i.e. non-exploded)
publications. This allows "seeking" into the zip archive (i.e. moving a
pointer/cursor into the binary asset at arbitrary locations, back and
forth). This seek-and-fetch mechanism is necessary in order to extract
zip-directory information, and ultimately to "stream" (i.e. extract)
individual publication resources out of the container (decompressing /
inflating data on the fly, unless the files are stored uncompressed /
non-deflated).

I use the term "stream" loosely here: the key difference with "proper
streaming" is that we cannot do useful things when processing a flow of zip
archive data from beginning to end, instead we need to seek into the binary
information just as if this was a buffer of pre-determined length and
structure.

Note that the processing model for "remote packed publications" would
typically also include a fallback to regular non-partial HTTP requests
(either because of lack of HTTP 1.1 support on the server side, or because
of insufficient information in HTTP headers, such as CORS in a pure
web-browser context). This fallback strategy basically entails downloading
the entire zip resource in order to access its directory suffix. The
downloaded data can reside in a transient memory blob, or into more
persistent local storage if system capabilities allow for it.

I hope this helps.
Regards, Daniel

Received on Thursday, 13 December 2018 13:57:20 UTC