- From: Brady Duga <duga@google.com>
- Date: Thu, 13 Dec 2018 10:21:24 -0800
- To: Laurent Le Meur <laurent.lemeur@edrlab.org>
- Cc: Dave Cramer <dauwhe@gmail.com>, Matt Garrish <matt.garrish@gmail.com>, W3C Publishing Working Group <public-publ-wg@w3.org>
- Message-ID: <CAH_p_eVEYR63NWbBZudM9Rb8fnkmVjWiMA74UiLbUgVQ5Ti4VA@mail.gmail.com>
Yes, that seems correct. There is certainly no reason a file format must have a signature (though, from your link it is clear many do). But the 2 arguments I have seen against it are it is hard to add and useless. I disagree with both those points. On Thu, Dec 13, 2018 at 10:09 AM Laurent Le Meur <laurent.lemeur@edrlab.org> wrote: > So Brady, you're take is that the signature file (aka mimetype file) is in > fact a set of magic numbers <https://asecuritysite.com/forensics/magic>, > which follows the zip signature (I found 50 4B 03 04 on Wikipedia). > Ok, that's a logical use of it, thanks for the info. > > But as you say, it doesn't mean we MUST keep it in a new OCF version: it > means that if we remove it for packaged WPs, it may be more difficult for > general systems to filter these ou of the mass of zipped files they have to > process. > > Laurent > > > Le 13 déc. 2018 à 17:38, Brady Duga <duga@google.com> a écrit : > > There seem to be some misconceptions around the use of the signature in > EPUB files that might need some clearing up. I hear that it is a pain to > add them and are of no use to reading systems, particularly for the > streaming case. This is largely true, but also completely misses the point. > If the only type of zip based binary blob you deal with is EPUB, then the > signature is useless since the only sensible thing for you to do with it is > treat it as an EPUB. However, if you are a more general system that can > open multiple file types you want to know which path to take and > potentially which module to route the data to. For instance, Google (*not* > Play books) makes use of the signature in a number of other products to > figure out what to do with EPUB files. I expect there are other companies > that have products which do the same thing (and are *not* Reading Systems). > This is true of most file types - if your blob starts with 137 80 78 71 13 > 10 26 10 it is probably a PNG, and in the absence of a file extension is a > good way to set a mimetype for that file. The same is true of our signature > - if we get something that looks like a zip, but has no file extension (or > the wrong one) we can easily check for a specific byte sequence at a known > location. That is why the file MUST be first and MUST NOT be encoded. > Removing either of those restrictions makes the file entirely useless to > everyone. > > So: > 1. The EPUB signature file is useless for streaming epubs. > 2. The EPUB signature file is useless for EPUB Reading Systems that have > another way to identify the file type (eg the only type of ZIP archive > supported). > 3. The EPUB signature file is useless for people creating content, since > they obviously know what they just created. > 4. The EPUB signature file is USEFUL for general systems that want to > handle an array of files, either internally or through use of external > modules (eg an add-on editor component, or an OS that wants to route the > file to the correct app) > > Typically the stakeholders here represent items 1-3, but item 4 still > seems like a useful case and is one that is widely supported in other file > types. Items 1-3 apply equally well to PNG files, or other file types with > a signature. > > As for difficulty in generating the file - perhaps. Most publishers seem > to have figured out how to do it. I am not sure how many person hours they > waste daily adding signatures to epub files. I expect (hope) the answer is > 0. It is fairly trivial to do from the command line in any *nix > environment. For dedicated epub creation tools - well, again, you do this > once and it then just works. I expect we have spent more time discussing > the issue than engineers have spent adding the file. > > My only real concern with the signature is claiming that a WP is an EPUB, > which seems like a good reason to change it. > > On Thu, Dec 13, 2018 at 7:58 AM Dave Cramer <dauwhe@gmail.com> wrote: > >> On Tue, Dec 11, 2018 at 6:40 AM Matt Garrish <matt.garrish@gmail.com> >> wrote: >> >>> The one “feature” of OCF that everyone seems to hate is that it requires >>> the mimetype file be the first in the ZIP container. That makes packaging >>> an EPUB more complicated than just zipping up all the files, since the >>> mimetype typically won’t get inserted first in general zipping scenarios. >>> >>> >>> >>> If we remove this restriction from OCF 3.2, then we possibly break the >>> loading of publications in reading systems that won’t process a publication >>> without first encountering the mimetype. I have no idea how many that is, >>> or if it’s common to fall back to finding a mimetype elsewhere in the zip >>> if it’s not first. >>> >>> >>> >> >> Yesterday I made an EPUB without a mimetype, and just zipped it. Changed >> the file extension to EPUB. It would not load at all in iBooks, Adobe >> Digital Editions, Kobo, or AZARDI. It worked in Google Play Books. Kindle >> Previewer did process it, and the resulting Mobi worked in Kindle/Mac. >> >> Dave >> >> >> >
Received on Thursday, 13 December 2018 18:21:59 UTC