- From: Daniel Weck <daniel.weck@gmail.com>
- Date: Thu, 13 Dec 2018 13:56:46 +0000
- To: Laurent Le Meur <laurent.lemeur@edrlab.org>
- Cc: Matt Garrish <matt.garrish@gmail.com>, W3C Publishing Working Group <public-publ-wg@w3.org>
- Message-ID: <CA+FkZ9GnfmO7yhDtmin=qtd3Jtamab7AQqp-SJqijNxCKVrvkg@mail.gmail.com>
On Tue, 11 Dec 2018 at 12:14, Laurent Le Meur <laurent.lemeur@edrlab.org> wrote: > > Reading apps use common zip libraries for their system, and look for the mimetype file wherever it is. Up to developers who developed differently to raise their hand. > Logically, this mimetype-first constraint has been crafted for streaming usage of EPUB (where the reader opens the zip file and get sequential chunks out of it). It doubt this is a standard way of handling EPUB files. I am not aware of any reading system implementation that makes use of (i.e. checks for) the initial "mimetype" file in EPUB containers. They probably exist, I am just not aware of them (leaving aside validation tools, of course ;) More importantly, here are some technical details about ZIP: the "central directory" of a zip archive lists and describes the files that are stored/compressed within the container. This data structure is located at the end of the stream of binary data, in a predictable / discoverable location. I implemented support for "remote EPUBs" on at least two separate occasions (if my memory serves me well), relying on HTTP 1.1 "partial requests" to fetch arbitrary byte ranges from "packaged / packed" (i.e. non-exploded) publications. This allows "seeking" into the zip archive (i.e. moving a pointer/cursor into the binary asset at arbitrary locations, back and forth). This seek-and-fetch mechanism is necessary in order to extract zip-directory information, and ultimately to "stream" (i.e. extract) individual publication resources out of the container (decompressing / inflating data on the fly, unless the files are stored uncompressed / non-deflated). I use the term "stream" loosely here: the key difference with "proper streaming" is that we cannot do useful things when processing a flow of zip archive data from beginning to end, instead we need to seek into the binary information just as if this was a buffer of pre-determined length and structure. Note that the processing model for "remote packed publications" would typically also include a fallback to regular non-partial HTTP requests (either because of lack of HTTP 1.1 support on the server side, or because of insufficient information in HTTP headers, such as CORS in a pure web-browser context). This fallback strategy basically entails downloading the entire zip resource in order to access its directory suffix. The downloaded data can reside in a transient memory blob, or into more persistent local storage if system capabilities allow for it. I hope this helps. Regards, Daniel
Received on Thursday, 13 December 2018 13:57:20 UTC