W3C home > Mailing lists > Public > public-digipub-ig@w3.org > January 2016

Re: [Moderator Action] [Moderator Action] Proposal: PDF alternative using HTML (ZIP/GZIP)

From: Leonard Rosenthol <lrosenth@adobe.com>
Date: Mon, 18 Jan 2016 17:13:24 +0000
To: Craig Francis <craig.francis@gmail.com>, Ivan Herman <ivan@w3.org>
CC: W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-ID: <55CAA86B-F7A8-4250-B7BA-9E87C56D7F5A@adobe.com>
While a PDF file does need a “reader”, it should be pointed out that EVERY MAJOR browser (Safari, Chrome, Edge, FireFox) all include PDF viewing natively.  So that a user browsing PDFs on the web doesn’t need anything extra.

Leonard




On 1/18/16, 11:43 AM, "Craig Francis" <craig.francis@gmail.com> wrote:

>On 18 Jan 2016, at 16:13, Ivan Herman <ivan@w3.org> wrote:
>
>> Yeah. That will take time. On MacOS (starting from, I believe, Mavericks) the system comes with an epub reader, so files of this kind are automatically opened much like PDF files. Yes, it is an ebook reader on the OS, but that is not much different than using a PDF reader.
>> 
>> To be incorporated into browsers is a big step (and would be a big step forward) which will need additional spec work. We are kept busy:-)
>
>
>
>Good to know, and good point about PDF files needing a reader.
>
>If I could push the format in any way (more so how the software works), I would like to be able to send a document that is opened, read, and closed without it being imported into some kind of library.
>
>Maybe some ability for email clients to open the file for a "quick look" (as per the OSX term), then optionally import.
>
>But I realise this is going away from the idea of using this format primarily for books.
>
>Anyway, thanks for the heads up.
>
>Craig
>
>
>
>
>
>
>> On 18 Jan 2016, at 16:13, Ivan Herman <ivan@w3.org> wrote:
>> 
>>> 
>>> On 18 Jan 2016, at 16:58, Craig Francis <craig.francis@gmail.com> wrote:
>>> 
>>> Hi Ivan,
>>> 
>>> Just to follow up on this, I've been reading the spec at:
>>> 
>>> http://www.idpf.org/epub/30/spec/epub30-overview.html

>>> 
>>> And it does seem pretty much what I'm after.
>>> 
>>> I'm not sure I like the extra meta files, but maybe they are useful (e.g. the possibility of containing multiple HTML documents, one for each language).
>>> 
>> 
>> For example. A book may also consists of many chapters each in their individual files and the order is not clear. Etc.
>> 
>>> So really the only remaining problem is getting email clients, browsers, OS'es to be able to open these files quickly/easily... rather than just automatically importing the file into an ebook reader.
>> 
>> Yeah. That will take time. On MacOS (starting from, I believe, Mavericks) the system comes with an epub reader, so files of this kind are automatically opened much like PDF files. Yes, it is an ebook reader on the OS, but that is not much different than using a PDF reader.
>> 
>> To be incorporated into browsers is a big step (and would be a big step forward) which will need additional spec work. We are kept busy:-)
>> 
>> Cheers
>> 
>> Ivan
>> 
>> 
>> 
>>> 
>>> Craig
>>> 
>>> 
>>> 
>>> 
>>>> On 14 Jan 2016, at 11:17, Ivan Herman <ivan@w3.org> wrote:
>>>> 
>>>> 
>>>>> On 14 Jan 2016, at 12:05, Craig Francis <craig@craigfrancis.co.uk> wrote:
>>>>> 
>>>>> Thanks Ivan,
>>>>> 
>>>>> You are right, I normally focus more on security side of things.
>>>>> 
>>>>> But out of interest, EPUB3, is that likely to get the same integration as how PDFs work at the moment?
>>>>> 
>>>>> As in, you can email someone an EPUB3 file, and the recipient can click/tap on it to quickly view in their email client?
>>>>> 
>>>>> Or simply have the web browser open it, rather than needing a dedicated EPUB3 reader?
>>>> 
>>>> In theory, all this is possible but the infrastructure is not as widespread as for PDF. Eg, you need extensions for Firefox to open an epub directly.
>>>> 
>>>> 
>>>>> 
>>>>> So far I've really only considered EPUB as more of a format for books (which is probably my lack of understanding of the format), so I've never really thought of its use for reports, leaflets, etc (i.e. things that PDF's tend to be used for).
>>>>> 
>>>> 
>>>> EPUB is perfectly capable of handling that out of the box.
>>>> 
>>>> Ivan
>>>> 
>>>> 
>>>>> In the mean time I'll have a read up on the PWP group.
>>>>> 
>>>>> Craig
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 14 Jan 2016, at 10:52, Ivan Herman <ivan@w3.org> wrote:
>>>>>> 
>>>>>> Craig,
>>>>>> 
>>>>>> thanks for your note. Two comments:
>>>>>> 
>>>>>> - The format EPUB3, defined by IDPF, already does many of what you say. On a very high level, it takes a (slightly constrained) Web site and puts it into, essentially, a zip file. For many applications, this is a worthy replacement for PDF. Note that almost all the electronic books you buy today are in EPUB3 or its predecessor...
>>>>>> 
>>>>>> - The DPUB IG also looks further down the line on a stronger integration of digital publishing and the OWP:
>>>>>> 
>>>>>> http://www.w3.org/TR/pwp

>>>>>> 
>>>>>> which may lead to significant changes in the future.
>>>>>> 
>>>>>> Bottom line: this evolution is already happening!
>>>>>> 
>>>>>> I understand you come more from the security area; there may be security issues with EPUB3 or PWP which we do not fully appreciate, so any comment is welcome of course!
>>>>>> 
>>>>>> Cheers
>>>>>> 
>>>>>> Ivan
>>>>>> 
>>>>>> 
>>>>>>> On 14 Jan 2016, at 11:34, Craig Francis <craig@craigfrancis.co.uk> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Recently I've been thinking of some of the problems with PDF's, which are useful for creating a document that can be archived, emailed, printed, etc.
>>>>>>> 
>>>>>>> HTML has solutions for many of PDF's problems though, for example structured text (accessibility), ability to change layout depending on screen size (no need for small screen devices to zoom into a fixed A4 layout), can change font size, better indexing support (searching for documents), etc.
>>>>>>> 
>>>>>>> Unfortunately you can't just email a HTML document to someone, as this causes a range of security problems, and including resources can be difficult (you can inline them, or use MHTML, but these are tricky to create).
>>>>>>> 
>>>>>>> So I was wondering if we could take the approach that Microsoft Word did with the docx format, Java with JAR, PHP with PHAR, etc...
>>>>>>> 
>>>>>>> Have a new file format, associated with the browser, which is just a ZIP/GZIP file that contains an index.html file, and everything else needed for the document.
>>>>>>> 
>>>>>>> Then from a security point of view, it can be locked down to its own little box, so no access to other files on the file system, probably no access to cookies/localstorage, no ability to connect to another host.
>>>>>>> 
>>>>>>> And from the users point of view, the document could be protected with a password (a feature that ZIP/GZIP provides already, and the browser can prompt for when opening).
>>>>>>> 
>>>>>>> So would this help with the security aspects of emailing HTML files to people (e.g. reports), and be better than PDFs?
>>>>>>> 
>>>>>>> Craig
>>>>>>> 
>>>>>>> ---
>>>>>>> 
>>>>>>> https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0063.html

>>>>>>> 
>>>>>>> https://code.google.com/p/chromium/issues/detail?id=575677

>>>>>>> 
>>>>>>> https://bugzilla.mozilla.org/show_bug.cgi?id=1237990

>>>>>>> 
>>>>>>> https://wpdev.uservoice.com/forums/257854-microsoft-edge-developer/suggestions/11443002-webpage-zip-as-alternative-to-pdf

>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ----
>>>>>> Ivan Herman, W3C
>>>>>> Digital Publishing Lead
>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>>> mobile: +31-641044153
>>>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> ----
>>>> Ivan Herman, W3C
>>>> Digital Publishing Lead
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>
>
Received on Monday, 18 January 2016 17:13:55 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:36:22 UTC