Re: [AudioTF] Agenda 2018-12-14 from Laurent Le Meur on 2018-12-13 (public-publ-wg@w3.org from December 2018)

From: Laurent Le Meur <laurent.lemeur@edrlab.org>
Date: Thu, 13 Dec 2018 15:24:24 +0100
To: Daniel Weck <daniel.weck@gmail.com>, W3C Publishing Working Group <public-publ-wg@w3.org>
Cc: Matt Garrish <matt.garrish@gmail.com>
Message-Id: <DCC3BA33-ADBF-4A58-958C-C57CB530D9CF@edrlab.org>

Just to summarize what I conclude from Daniel's detailed prose: 

- the mimetype file in OCF is of no use for most reading systems.
- having it first in the zip makes no sense, even from streaming purposes, as a reading system accessing a "remote EPUB" will still need to access the "central directory" which is at the end of the zip archive. 

In conclusion: 
1/ EPUB / OCF, because it is based on Zip, is a useful interchange and download format but is not made to be used remotely. "Streaming" usages are much better served by Web Publications.
2/ we can relax the position of the mimetype file in the next version of OCF, for EPUB 3.2 and for packaged Audiopub. And we can even make it optional. 
This next OCF version will therefore be strongly backward compatible with the previous one, which is great.
 
Laurent

> Le 13 déc. 2018 à 14:56, Daniel Weck <daniel.weck@gmail.com <mailto:daniel.weck@gmail.com>> a écrit :
> 
> On Tue, 11 Dec 2018 at 12:14, Laurent Le Meur <laurent.lemeur@edrlab.org <mailto:laurent.lemeur@edrlab.org>> wrote:
> >
> > Reading apps use common zip libraries for their system, and look for the mimetype file wherever it is. Up to developers who developed differently to raise their hand.
> > Logically, this mimetype-first constraint has been crafted for streaming usage of EPUB (where the reader opens the zip file and get sequential chunks out of it). It doubt this is a standard way of handling EPUB files.
> 
> I am not aware of any reading system implementation that makes use of (i.e. checks for) the initial "mimetype" file in EPUB containers. They probably exist, I am just not aware of them (leaving aside validation tools, of course ;)
> 
> More importantly, here are some technical details about ZIP: the "central directory" of a zip archive lists and describes the files that are stored/compressed within the container. This data structure is located at the end of the stream of binary data, in a predictable / discoverable location.
> 
> I implemented support for "remote EPUBs" on at least two separate occasions (if my memory serves me well), relying on HTTP 1.1 "partial requests" to fetch arbitrary byte ranges from "packaged / packed" (i.e. non-exploded) publications. This allows "seeking" into the zip archive (i.e. moving a pointer/cursor into the binary asset at arbitrary locations, back and forth). This seek-and-fetch mechanism is necessary in order to extract zip-directory information, and ultimately to "stream" (i.e. extract) individual publication resources out of the container (decompressing / inflating data on the fly, unless the files are stored uncompressed / non-deflated).
> 
> I use the term "stream" loosely here: the key difference with "proper streaming" is that we cannot do useful things when processing a flow of zip archive data from beginning to end, instead we need to seek into the binary information just as if this was a buffer of pre-determined length and structure.
> 
> Note that the processing model for "remote packed publications" would typically also include a fallback to regular non-partial HTTP requests (either because of lack of HTTP 1.1 support on the server side, or because of insufficient information in HTTP headers, such as CORS in a pure web-browser context). This fallback strategy basically entails downloading the entire zip resource in order to access its directory suffix. The downloaded data can reside in a transient memory blob, or into more persistent local storage if system capabilities allow for it.
> 
> I hope this helps.
> Regards, Daniel
>

Received on Thursday, 13 December 2018 14:24:52 UTC