W3C home > Mailing lists > Public > public-publ-wg@w3.org > December 2018

Re: [AudioTF] Agenda 2018-12-14

From: Laurent Le Meur <laurent.lemeur@edrlab.org>
Date: Thu, 13 Dec 2018 19:09:54 +0100
Message-Id: <EF16805A-0DD0-4340-A9CF-EEB8E80EE18C@edrlab.org>
Cc: Dave Cramer <dauwhe@gmail.com>, Matt Garrish <matt.garrish@gmail.com>, W3C Publishing Working Group <public-publ-wg@w3.org>
To: Brady Duga <duga@google.com>
So Brady, you're take is that the signature file (aka mimetype file) is in fact a set of magic numbers <https://asecuritysite.com/forensics/magic>, which follows the zip signature (I found 50 4B 03 04 on Wikipedia). 
Ok, that's a logical use of it, thanks for the info. 

But as you say, it doesn't mean we MUST keep it in a new OCF version: it means that if we remove it for packaged WPs, it may be more difficult for general systems to filter these ou of the mass of zipped files they have to process.

Laurent

> Le 13 déc. 2018 à 17:38, Brady Duga <duga@google.com> a écrit :
> 
> There seem to be some misconceptions around the use of the signature in EPUB files that might need some clearing up. I hear that it is a pain to add them and are of no use to reading systems, particularly for the streaming case. This is largely true, but also completely misses the point. If the only type of zip based binary blob you deal with is EPUB, then the signature is useless since the only sensible thing for you to do with it is treat it as an EPUB. However, if you are a more general system that can open multiple file types you want to know which path to take and potentially which module to route the data to. For instance, Google (*not* Play books) makes use of the signature in a number of other products to figure out what to do with EPUB files. I expect there are other companies that have products which do the same thing (and are *not* Reading Systems). This is true of most file types - if your blob starts with 137 80 78 71 13 10 26 10 it is probably a PNG, and in the absence of a file extension is a good way to set a mimetype for that file. The same is true of our signature - if we get something that looks like a zip, but has no file extension (or the wrong one) we can easily check for a specific byte sequence at a known location. That is why the file MUST be first and MUST NOT be encoded. Removing either of those restrictions makes the file entirely useless to everyone.
> 
> So:
> 1. The EPUB signature file is useless for streaming epubs.
> 2. The EPUB signature file is useless for EPUB Reading Systems that have another way to identify the file type (eg the only type of ZIP archive supported).
> 3. The EPUB signature file is useless for people creating content, since they obviously know what they just created.
> 4. The EPUB signature file is USEFUL for general systems that want to handle an array of files, either internally or through use of external modules (eg an add-on editor component, or an OS that wants to route the file to the correct app)
> 
> Typically the stakeholders here represent items 1-3, but item 4 still seems like a useful case and is one that is widely supported in other file types. Items 1-3 apply equally well to PNG files, or other file types with a signature.
> 
> As for difficulty in generating the file - perhaps. Most publishers seem to have figured out how to do it. I am not sure how many person hours they waste daily adding signatures to epub files. I expect (hope) the answer is 0. It is fairly trivial to do from the command line in any *nix environment. For dedicated epub creation tools - well, again, you do this once and it then just works. I expect we have spent more time discussing the issue than engineers have spent adding the file.
> 
> My only real concern with the signature is claiming that a WP is an EPUB, which seems like a good reason to change it.
> 
> On Thu, Dec 13, 2018 at 7:58 AM Dave Cramer <dauwhe@gmail.com <mailto:dauwhe@gmail.com>> wrote:
> On Tue, Dec 11, 2018 at 6:40 AM Matt Garrish <matt.garrish@gmail.com <mailto:matt.garrish@gmail.com>> wrote:
> The one “feature” of OCF that everyone seems to hate is that it requires the mimetype file be the first in the ZIP container. That makes packaging an EPUB more complicated than just zipping up all the files, since the mimetype typically won’t get inserted first in general zipping scenarios.
> 
>  
> 
> If we remove this restriction from OCF 3.2, then we possibly break the loading of publications in reading systems that won’t process a publication without first encountering the mimetype. I have no idea how many that is, or if it’s common to fall back to finding a mimetype elsewhere in the zip if it’s not first.
> 
>  
> 
> 
> Yesterday I made an EPUB without a mimetype, and just zipped it. Changed the file extension to EPUB. It would not load at all in iBooks, Adobe Digital Editions, Kobo, or AZARDI. It worked in Google Play Books. Kindle Previewer did process it, and the resulting Mobi worked in Kindle/Mac. 
> 
> Dave
> 
> 


Received on Thursday, 13 December 2018 18:10:22 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:52:33 UTC