W3C home > Mailing lists > Public > public-digipub-ig@w3.org > January 2016

Re: Proposal: PDF alternative using HTML (ZIP/GZIP)

From: Craig Francis <craig.francis@gmail.com>
Date: Wed, 27 Jan 2016 11:25:08 +0000
Cc: Nick Ruffilo <nickruffilo@gmail.com>, Leonard Rosenthol <lrosenth@adobe.com>, Ivan Herman <ivan@w3.org>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-Id: <15968885-FF05-4743-9C37-EE48CFFAA89D@gmail.com>
To: Mike Perlman <perlmanm@me.com>
On 26 Jan 2016, at 20:03, Mike Perlman <perlmanm@me.com> wrote:

> Still, I was looking for a better user experience and users love to double click on the download and see the content without any fuss.


Yep, this is what I'm hoping for :-)

As to the external links to media, I would argue that for my use case (e.g. dealing with legal documents), then the file format should never be allowed to do this... but I see your point, as some videos can be pretty big.

With reveal.js, that was the example I was looking at yesterday, just to see how you were doing the downloading of a single file.

Unfortunately I can't do the same thing. This is because opening HTML files on a desktop computer is too dangerous, for example I could try to include additional resources/files into the page that I should not be able to access, but the person looking at the document can.

Craig




> On 26 Jan 2016, at 20:03, Mike Perlman <perlmanm@me.com> wrote:
> 
> Hi Craig
> 
> Initially I was working on an EPUB3 authoring system and I looked at several existing WordPress plugins, so outputting into a zip of separate files would just be a matter of re-coding. 
> The process itself is just different not more complex. Media wouldn’t have to be encoded LOL although encoding is kind of cool and mostly happens on the fly - apart from media  in CSS (which could happen on the fly) and fonts.
> 
> Still, I was looking for a better user experience and users love to double click on the download and see the content without any fuss.
> 
> The issue of external links to media - video or audio for example - is a matter of context. Some folks would hate this while others might need it as a requirement.
> 
> BTW speaking of presentations - I have a reveal.js sample - http://samples.5doc.org/revealjs/modular-front-end/ <http://samples.5doc.org/revealjs/modular-front-end/>
> 
> Cheers
> Mike
> 
> 
>> On 26 Jan 2016, at 07:54, Craig Francis <craig.francis@gmail.com <mailto:craig.francis@gmail.com>> wrote:
>> 
>> Hi Mike,
>> 
>> Likewise I'm not in the W3C, so I can't speak on their behalf.
>> 
>> I am impressed that you have managed to make something that can inline all of the resources like that.
>> 
>> Out of curiosity (and because I seem to be a bit obsessed with putting HTML into a ZIP at the moment), would development be made easier if you could keep those resources separate?
>> 
>> As in, you could create a ZIP with many files in them, where there is an index.html and separate css/js/img/video/html files?
>> 
>> There are also other advantages with keeping them as separate files within a ZIP, for example, if the JavaScript file managed to contain the string </script>, which hadn't been escaped :-)
>> 
>> This setup would also allow you to have multiple HTML files, which I suspect is the same for ePub and PWP.
>> 
>> That said, I personally wouldn't want the document needing to (or being able to) download additional resources (well, at least for what I want to use this for).
>> 
>> For example, if you sent a presentation to someone, they receive the file, put it on a USB stick, but when going to present, they do not have an internet connection to download the additional resources.
>> 
>> As to why I wouldn't like it to be *possible*, it's because you could track how often the document is opened, which browser is used (User Agent), or maybe replace content from an external source (which for my examples, such as legal documents, really isn't a good thing).
>> 
>> Craig
>> 
>> 
>> 
>> 
>> 
>>> On 26 Jan 2016, at 17:19, Mike Perlman <perlmanm@me.com <mailto:perlmanm@me.com>> wrote:
>>> 
>>> Hi all
>>> 
>>> I am not in the W3C, but Ivan let me know I could follow the list and respond directly to recipients.
>>> I apologize in advance if this is an affront, but there is a world beyond the W3C’s virtual walls where people are doing things about PWP.
>>> 
>>> I am the developer of 5doc.org <http://5doc.org/> and I have come up with my own vision of PWP - both as a downloadable web page and as an authored document for offline use.
>>> 
>>> I have several different samples on the site. Hit the 5DOC button on a sample and the PWP is created in real time and downloaded.
>>> 
>>> The biggest differences between what I am doing and the W3C - and discussed in this thread - are the following:
>>> 
>>> 1. In a 5DOC, all the content is stuffed into one file -  text, CSS, Javascript, SVG, fonts, media. Of course, large media could be sourced from the web.
>>> 
>>> 2. The downloaded document works in a browser. No reader necessary.
>>> 
>>> 3. My samples work now, today, with a few rough edges, but they work.
>>> 
>>> 4. I am using WordPress for authoring. Thus the creation of a 5DOC can be a group effort.
>>> 
>>> 
>>> Nevertheless I have similar issues to what I have seen discussed:
>>> 
>>> 1. It would be great if the browser could open/read compressed documents. We need any result possible ASAP and over time the kinks can be worked out.
>>> 
>>> 2. Internal links. For complex documents I am using fullpage.js which uses conventional internal links o navigate to chapters and sections. A TOC to subheads doesn’t work. I imagine this problem would appear for other Javascript code that creates virtual spaces.
>>> 
>>> One final comment: there should be little to no restrictions for what a PWP can be and as little to no requirements as well. 
>>> Anything goes, especially Javascript.
>>> 
>>> Cheers
>>> Mike
>>> 
>>> 
>>> 
>>>> On 26 Jan 2016, at 05:04, Craig Francis <craig.francis@gmail.com <mailto:craig.francis@gmail.com>> wrote:
>>>> 
>>>> Thanks for the reply Nick,
>>>> 
>>>> And please don't get me wrong, I certainly like the flexibility of what you're proposing.
>>>> 
>>>> Hopefully my previous email to Leonard might explain why I think we might need two (very similar) formats... maybe.
>>>> 
>>>> Craig
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 26 Jan 2016, at 15:15, Nick Ruffilo <nickruffilo@gmail.com <mailto:nickruffilo@gmail.com>> wrote:
>>>>> 
>>>>> Craig,
>>>>> 
>>>>> Great points.  I'm going to dive into the crazy waters here a bit - and this is aimed at everyone for us to think a bit uniquely here:
>>>>> 
>>>>> There will always be a struggle between having a high quality packaged content and users.  How can you allow an external call without the potential for malicious content being injected and when an ignorant user runs that - things go bad.  While this is a very real case, I think it is a great determent to the quality of the format to prevent ALL code simply to protect that case. Cookies, storage, javascript, etc are things that can create huge value (imagine a textbook where you could save your answers and have your quizzes be dynamically graded - and all of that saved with the copy of the book!)
>>>>> 
>>>>> We will always have to rely on external security applications to help us police.  Otherwise we might as well just print everything out - there's no viruses on paper.  Simply because a format as the ability to be mutable doesn't mean that every file sent using that format will be edited or should be edited.  I don't believe there is a case or need to make PWP inflexible and uneditable as PDFs are today (but lets be honest, a pro copy of Acrobat Reader makes any PDF editable...)  
>>>>> 
>>>>> I believe we also talked about scope - what is PWP's responsibility, and what is the responsibility of the reading system.  If I were developing a mail app, I would open every PWP in a complete sandbox environment and limit access to external files and such.  I'd include a warning that says: "Certain features may be disabled in this document.  If you wish to see it in full, download it and run it in the app of your choice.  For 99% of PWPs, it wouldn't change a thing.  In the few that required storage or other files, oh well, the user can download and read it outside of my app.  
>>>>> 
>>>>> I just think - especially from the perspective of PWP - doing anything to try to stop malicious code - beyond what is already available & part of the technologies we're using - will greatly limit the potential of what could be created with PWPs.
>>>>> 
>>>>> -Nick
>>>>> 
>>>>> 
>>>>> 
>>>>> On Tue, Jan 26, 2016 at 7:42 AM, Craig Francis <craig.francis@gmail.com <mailto:craig.francis@gmail.com>> wrote:
>>>>> Hi Nick,
>>>>> 
>>>>> Personally I think the ability for PDF and MS Office files being able to do scripting like they do is a major problem.
>>>>> 
>>>>> This is because it's done in such a way that it can be very dangerous/problematic (e.g. referencing other files on the computer, and making network requests), which is why the user needs to be prompted.
>>>>> 
>>>>> And we all know how well users respond to the "are you sure?" prompt (aka the "press [yes] to make this thing I don't understand work").
>>>>> 
>>>>> When it comes to these kinds of documents (i.e. reports), I would like it so that virus/spam scanners did not need to understand and inspect every file, just to guess if it's malicious.
>>>>> 
>>>>> Which is why IT departments would like to block MS Office documents if they could, but instead have to rely on virus/spam filters.
>>>>> 
>>>>> Ideally this HTML+ZIP+Sandbox proposal would result in a file format that can be freely emailed without any concerns about the security implications.
>>>>> 
>>>>> Now I realise that JavaScript is typically seen as necessary (e.g. interactive content), but if we take the mechanisms already in place in browsers to support CSP (Content Security Policy), and the work being done to make sub-origins work, we could still allow JavaScript within the document, and block everything else (e.g. outbound requests, cookies, local storage, etc).
>>>>> 
>>>>> So in regards to PWP, this security approach seems to be basically the same (which is good to hear). But as mentioned before, there are times when you want to simply receive a document that you know cannot be changed.
>>>>> 
>>>>> Which is why I think both of these solutions would work side by side.
>>>>> 
>>>>> And yes, they both use open web technology, but shouldn't be given any more access than they actually need :-)
>>>>> 
>>>>> Craig
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 22 Jan 2016, at 17:58, Nick Ruffilo <nickruffilo@gmail.com <mailto:nickruffilo@gmail.com>> wrote:
>>>>>> 
>>>>>> Craig,
>>>>>> 
>>>>>> Lets nail down exactly why the PWP wouldn't work for that situation.  Currently PDF does allow you some "scripting" but before it runs, the user is prompted: "this PDF has scripting, do you wish to turn it on"  Would something like that (the choice of the reading system) suffice?
>>>>>> 
>>>>>> Additionally, it is my understanding that the HTML and Javascript would be in a sandbox environment, and have limited access (if any) to manipulate external files.  It would be the reading system's responsibility to feed any data that the PWP would require externally.  So the security issues then lay outside of the PWP itself, and more in the reading system - something that PWP could possibly address as a note to implementors...
>>>>>> 
>>>>>> As a note - pretty much any MS Office file can have scripting in it, and can actually manipulate files on the filesystem (there are viruses written in word and excel).  Because of this, Microsoft warns you before you run a script in these formats.  This hasn't stopped business in any way (or IT) from trusting the storage and download of such files.
>>>>>> 
>>>>>> My understanding is that even though the contents are HTML - this is not to be thought of as the "open web" but a package format that uses all of the open web technology.  
>>>>>> 
>>>>>> -Nick
>>>>>> 
>>>>>> On Fri, Jan 22, 2016 at 12:12 PM, Craig Francis <craig.francis@gmail.com <mailto:craig.francis@gmail.com>> wrote:
>>>>>> Hi Nick,
>>>>>> 
>>>>>> Yes, I certainly like the ideas behind PWP, and I'm glad to see this is happening.
>>>>>> 
>>>>>> I just don't think it works for the original proposal, which is an alternative to PDF's, having all the benefits of HTML, but still remaining read-only files that can be emailed, and IT Departments can trust being on their computers (ref the security restrictions that can applied).
>>>>>> 
>>>>>> Craig
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On 21 Jan 2016, at 14:16, Nick Ruffilo <nickruffilo@gmail.com <mailto:nickruffilo@gmail.com>> wrote:
>>>>>>> 
>>>>>>> Craig,
>>>>>>> 
>>>>>>> To your point of PWP being a format that has an interaction with a server - I don't disagree, but I think that's only 1 of the two main use cases for PWP.  One of those cases is to be able to be a quality container for ebooks.  Ebooks are expected to be read in an offline mode on devices that may not have any connectivity to the internet.  In these cases, online is simply not an option - therefore the PWP must work in a 100% offline mode.  The content creator ultimately has the choice to build their PWP the way they see fit.
>>>>>>> 
>>>>>>> I imagine a significant majority of PWPs created will be "offline" assuming that popular word processors adopt it as a format.  Mainly because of the business case you brought up - an employee generating an offline-mode file for sharing and archival purposes.  But, there will be many use cases where an updateable, benefiting-from-access-to-the-internet document format is superior.
>>>>>>> 
>>>>>>> -Nick
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Jan 21, 2016 at 7:02 AM, Craig Francis <craig.francis@gmail.com <mailto:craig.francis@gmail.com>> wrote:
>>>>>>> Hi Nick,
>>>>>>> 
>>>>>>> I'm glad to see that you're not trying to dilute PWP with too many use cases.
>>>>>>> 
>>>>>>> With your comment about exporting it as a HTML file, and emailing that, this is where the problems currently lie, and why I'm making this proposal.
>>>>>>> 
>>>>>>> I'm not sure which mailing lists you are subscribed to, but in summary, a HTML file on its own is a big security problem, and it's difficult to include resources (in terms of development time/tooling)... for more info, please see:
>>>>>>> 
>>>>>>> https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0090.html <https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0090.html>
>>>>>>> 
>>>>>>> https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0089.html <https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0089.html>
>>>>>>> 
>>>>>>> In regards to PWP, I feel that it is a good idea, and defiantly has its use cases.
>>>>>>> 
>>>>>>> But I suspect that file format PWP becomes to be known as, will be seen as something that has an interaction with a server, and allows for the document to be updated.
>>>>>>> 
>>>>>>> That defiantly has its uses, but as with PDF's, there are cases where it's good to know that the file sent cannot change, or communicate with an external server for any reason (instead its seen as being locked down, in a read only state, via a sand box that the browser provides).
>>>>>>> 
>>>>>>> So where you see PWP being a more versatile format than PDF, that is good, but I believe we also need a second branch which takes some of the strengths of PDF, and uses existing technology to fix some of its problems (which I hope my previous emails explain, but I am happy to discuss if not).
>>>>>>> 
>>>>>>> Craig
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On 19 Jan 2016, at 14:39, Nick Ruffilo <nickruffilo@gmail.com <mailto:nickruffilo@gmail.com>> wrote:
>>>>>>>> 
>>>>>>>> Craig,
>>>>>>>> 
>>>>>>>> These are great questions, and I hope I can address some of them.  First off - PWP - like any potential document format - is not aimed at solving all possible use cases, nor should it.  That said, we also realize that there is potentially a gap in what software capabilities are today and what might be needed for a high-quality PWP to function as smoothly as a PDF would today.
>>>>>>>> 
>>>>>>>> To speak to your specific case - the PDF sales report.  Using today's technology, you could export that sales report as an HTML file, attach that, and open that in your browser.  It can be archived, the local copy can only be changed by the user, etc,  What is not yet native in most browsers is the ability to have a package of HTML files.
>>>>>>>> 
>>>>>>>> For the case of a completely offline file - something more static - PWP completely allows for that, as long as the package is created referencing static files that can be grabbed when making the offline package.  That is completely within scope and a use case that has been considered. PWP does go one step further and let you have files that reference external resources.  This would let you keep data charts up-to-date, Make quick updates to color schemes, or pretty much anything else you may want to update.  This is a feature - and optional.
>>>>>>>> 
>>>>>>>> From my perspective - the goal for PWP is to create a package format that makes sense for the future.  PDF has specific use cases where it is amazing - it has had many years to be adopted and honed.  Outside of those use cases,  PWP hopes to cover many things that PDF does not do.  That doesn't mean that PDF will be useless, as I imagine businesses will be exporting sales reports in PDF for the next 10 years (the same way people are still using CSV when there is XLSX format...)  But I believe that PWP aims to be a more versatile format than PDF which is it's differentiation.
>>>>>>>> 
>>>>>>>> -Nick 
>>>>>>>> 
>>>>>>>> On Tue, Jan 19, 2016 at 7:29 AM, Craig Francis <craig.francis@gmail.com <mailto:craig.francis@gmail.com>> wrote:
>>>>>>>> On 18 Jan 2016, at 20:42, Leonard Rosenthol <lrosenth@adobe.com <mailto:lrosenth@adobe.com>> wrote:
>>>>>>>> 
>>>>>>>> > Actually, Ivan is pointing out that an active work project - called PWP
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hi Leonard,
>>>>>>>> 
>>>>>>>> And yes, good point, I completely mixed up the EUPB3 and PWP (Portable Web Publication):
>>>>>>>> 
>>>>>>>> http://www.w3.org/TR/pwp <http://www.w3.org/TR/pwp>
>>>>>>>> 
>>>>>>>> I've just read though the PWP Working Draft, and have some notes below.
>>>>>>>> 
>>>>>>>> In summary, I think it's a good idea, but I'm not sure it really focuses on the same problem (but please let me know if I've misunderstood).
>>>>>>>> 
>>>>>>>> Craig
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Just to set the tone, people like to receive PDF's for documents (e.g. sales reports) because they can be treated as an atomic document, that isn't really editable (unlike an email), and can be saved for archivable purposes (with no reliance on a website to be available to view it).
>>>>>>>> 
>>>>>>>> Another example is someone who sees a webpage with some useful content, and they want a copy of that content on their local computer (aka "Save Web Page as"), so that they don't need to rely on an internet connection, for the website to remain available (or being able to find the page again), or the content on that page to change.
>>>>>>>> 
>>>>>>>> Now there are defiantly some similarities to the problems we are trying to address, with the main focus for me being the archive format:
>>>>>>>> 
>>>>>>>> https://www.w3.org/TR/pwp/#package <https://www.w3.org/TR/pwp/#package>
>>>>>>>> 
>>>>>>>> But this seems to be a very general spec, with options to have the content unpackaged and delivered over the internet (rather than just a single file):
>>>>>>>> 
>>>>>>>> https://www.w3.org/TR/pwp/#state_definition <https://www.w3.org/TR/pwp/#state_definition>
>>>>>>>> 
>>>>>>>> In contrast, the spec seems to not really focus on being a file that can be passed around/archived (e.g. emailing a PDF), but instead a central resource which allows for copies of the document to be downloaded.
>>>>>>>> 
>>>>>>>> https://www.w3.org/TR/pwp/#identification <https://www.w3.org/TR/pwp/#identification>
>>>>>>>> 
>>>>>>>> This is useful if you want to have a central location for a document, and is kept up to date, but not so good if the primary purpose is really to have a copy that is created at one point in time, where the person who receives a copy will know that at it will stay as-is (read only).
>>>>>>>> 
>>>>>>>> This setup seems to be confirmed in the security section:
>>>>>>>> 
>>>>>>>> https://www.w3.org/TR/pwp/#security-models <https://www.w3.org/TR/pwp/#security-models>
>>>>>>>> 
>>>>>>>> So if I was to send a report to a manager with sales figures, they will want to open it on their mobile phone (a quick read before bedtime, I assume), then later save it to their desktop computer so they can compare it later to the next months report.
>>>>>>>> 
>>>>>>>> So when the Working Draft mentions things like JavaScript Service Workers:
>>>>>>>> 
>>>>>>>> https://www.w3.org/TR/pwp/#arch <https://www.w3.org/TR/pwp/#arch>
>>>>>>>> 
>>>>>>>> And the concept of these documents having the ability to do things (presumably allowing the content to change, perform tracking, etc), I don't think it's fundamentally the right approach to this problem.
>>>>>>>> 
>>>>>>>> But don't get me wrong, Portable Web Publications would be very good for Publications... I just don't think many businesses use PDF attachments in that way.
>>>>>>>> 
>>>>>>>> :-)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> > On 18 Jan 2016, at 20:42, Leonard Rosenthol <lrosenth@adobe.com <mailto:lrosenth@adobe.com>> wrote:
>>>>>>>> >
>>>>>>>> > Actually, Ivan is pointing out that an active work project - called PWP (Portable Web Publication - to address the need for having a better way to publish content using web technologies both in a packaged and unpackaged form.
>>>>>>>> >
>>>>>>>> > A solution that aligns with EPUB (but would not be EPUB 3.x as we know it today) is certainly something being serious considered by various folks as part of this work.
>>>>>>>> >
>>>>>>>> > Leonard
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On 1/18/16, 12:26 PM, "Craig Francis" <craig.francis@gmail.com <mailto:craig.francis@gmail.com>> wrote:
>>>>>>>> >
>>>>>>>> >> On 18 Jan 2016, at 17:13, Leonard Rosenthol <lrosenth@adobe.com <mailto:lrosenth@adobe.com>> wrote:
>>>>>>>> >>> So that a user browsing PDFs on the web doesn’t need anything extra.
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> I think Ivan is suggesting that EPUB3 might do the same.
>>>>>>>> >>
>>>>>>>> >> I'm still not 100% convinced how well it will work (as this does depend heavily on the OS, and browsers).
>>>>>>>> >>
>>>>>>>> >> But in both cases (EPUB3, or using a ZIP to wrap up the HTML document+assets) most of the building blocks are already in place.
>>>>>>>> >>
>>>>>>>> >> Craig
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>> On 18 Jan 2016, at 17:13, Leonard Rosenthol <lrosenth@adobe.com <mailto:lrosenth@adobe.com>> wrote:
>>>>>>>> >>>
>>>>>>>> >>> While a PDF file does need a “reader”, it should be pointed out that EVERY MAJOR browser (Safari, Chrome, Edge, FireFox) all include PDF viewing natively.  So that a user browsing PDFs on the web doesn’t need anything extra.
>>>>>>>> >>>
>>>>>>>> >>> Leonard
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> On 1/18/16, 11:43 AM, "Craig Francis" <craig.francis@gmail.com <mailto:craig.francis@gmail.com>> wrote:
>>>>>>>> >>>
>>>>>>>> >>>> On 18 Jan 2016, at 16:13, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>>>>>>>> >>>>
>>>>>>>> >>>>> Yeah. That will take time. On MacOS (starting from, I believe, Mavericks) the system comes with an epub reader, so files of this kind are automatically opened much like PDF files. Yes, it is an ebook reader on the OS, but that is not much different than using a PDF reader.
>>>>>>>> >>>>>
>>>>>>>> >>>>> To be incorporated into browsers is a big step (and would be a big step forward) which will need additional spec work. We are kept busy:-)
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>> Good to know, and good point about PDF files needing a reader.
>>>>>>>> >>>>
>>>>>>>> >>>> If I could push the format in any way (more so how the software works), I would like to be able to send a document that is opened, read, and closed without it being imported into some kind of library.
>>>>>>>> >>>>
>>>>>>>> >>>> Maybe some ability for email clients to open the file for a "quick look" (as per the OSX term), then optionally import.
>>>>>>>> >>>>
>>>>>>>> >>>> But I realise this is going away from the idea of using this format primarily for books.
>>>>>>>> >>>>
>>>>>>>> >>>> Anyway, thanks for the heads up.
>>>>>>>> >>>>
>>>>>>>> >>>> Craig
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>>>> On 18 Jan 2016, at 16:13, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>>>>>>>> >>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> On 18 Jan 2016, at 16:58, Craig Francis <craig.francis@gmail.com <mailto:craig.francis@gmail.com>> wrote:
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Hi Ivan,
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Just to follow up on this, I've been reading the spec at:
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> http://www.idpf.org/epub/30/spec/epub30-overview.html <http://www.idpf.org/epub/30/spec/epub30-overview.html>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> And it does seem pretty much what I'm after.
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> I'm not sure I like the extra meta files, but maybe they are useful (e.g. the possibility of containing multiple HTML documents, one for each language).
>>>>>>>> >>>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> For example. A book may also consists of many chapters each in their individual files and the order is not clear. Etc.
>>>>>>>> >>>>>
>>>>>>>> >>>>>> So really the only remaining problem is getting email clients, browsers, OS'es to be able to open these files quickly/easily... rather than just automatically importing the file into an ebook reader.
>>>>>>>> >>>>>
>>>>>>>> >>>>> Yeah. That will take time. On MacOS (starting from, I believe, Mavericks) the system comes with an epub reader, so files of this kind are automatically opened much like PDF files. Yes, it is an ebook reader on the OS, but that is not much different than using a PDF reader.
>>>>>>>> >>>>>
>>>>>>>> >>>>> To be incorporated into browsers is a big step (and would be a big step forward) which will need additional spec work. We are kept busy:-)
>>>>>>>> >>>>>
>>>>>>>> >>>>> Cheers
>>>>>>>> >>>>>
>>>>>>>> >>>>> Ivan
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>> Craig
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>> On 14 Jan 2016, at 11:17, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>> On 14 Jan 2016, at 12:05, Craig Francis <craig@craigfrancis.co.uk <mailto:craig@craigfrancis.co.uk>> wrote:
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Thanks Ivan,
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> You are right, I normally focus more on security side of things.
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> But out of interest, EPUB3, is that likely to get the same integration as how PDFs work at the moment?
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> As in, you can email someone an EPUB3 file, and the recipient can click/tap on it to quickly view in their email client?
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Or simply have the web browser open it, rather than needing a dedicated EPUB3 reader?
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> In theory, all this is possible but the infrastructure is not as widespread as for PDF. Eg, you need extensions for Firefox to open an epub directly.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> So far I've really only considered EPUB as more of a format for books (which is probably my lack of understanding of the format), so I've never really thought of its use for reports, leaflets, etc (i.e. things that PDF's tend to be used for).
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> EPUB is perfectly capable of handling that out of the box.
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> Ivan
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>> In the mean time I'll have a read up on the PWP group.
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>> Craig
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>>> On 14 Jan 2016, at 10:52, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> Craig,
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> thanks for your note. Two comments:
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> - The format EPUB3, defined by IDPF, already does many of what you say. On a very high level, it takes a (slightly constrained) Web site and puts it into, essentially, a zip file. For many applications, this is a worthy replacement for PDF. Note that almost all the electronic books you buy today are in EPUB3 or its predecessor...
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> - The DPUB IG also looks further down the line on a stronger integration of digital publishing and the OWP:
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> http://www.w3.org/TR/pwp <http://www.w3.org/TR/pwp>
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> which may lead to significant changes in the future.
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> Bottom line: this evolution is already happening!
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> I understand you come more from the security area; there may be security issues with EPUB3 or PWP which we do not fully appreciate, so any comment is welcome of course!
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> Cheers
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> Ivan
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>>> On 14 Jan 2016, at 11:34, Craig Francis <craig@craigfrancis.co.uk <mailto:craig@craigfrancis.co.uk>> wrote:
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> Hi,
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> Recently I've been thinking of some of the problems with PDF's, which are useful for creating a document that can be archived, emailed, printed, etc.
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> HTML has solutions for many of PDF's problems though, for example structured text (accessibility), ability to change layout depending on screen size (no need for small screen devices to zoom into a fixed A4 layout), can change font size, better indexing support (searching for documents), etc.
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> Unfortunately you can't just email a HTML document to someone, as this causes a range of security problems, and including resources can be difficult (you can inline them, or use MHTML, but these are tricky to create).
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> So I was wondering if we could take the approach that Microsoft Word did with the docx format, Java with JAR, PHP with PHAR, etc...
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> Have a new file format, associated with the browser, which is just a ZIP/GZIP file that contains an index.html file, and everything else needed for the document.
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> Then from a security point of view, it can be locked down to its own little box, so no access to other files on the file system, probably no access to cookies/localstorage, no ability to connect to another host.
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> And from the users point of view, the document could be protected with a password (a feature that ZIP/GZIP provides already, and the browser can prompt for when opening).
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> So would this help with the security aspects of emailing HTML files to people (e.g. reports), and be better than PDFs?
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> Craig
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> ---
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0063.html <https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0063.html>
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> https://code.google.com/p/chromium/issues/detail?id=575677 <https://code.google.com/p/chromium/issues/detail?id=575677>
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> https://bugzilla.mozilla.org/show_bug.cgi?id=1237990 <https://bugzilla.mozilla.org/show_bug.cgi?id=1237990>
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>> https://wpdev.uservoice.com/forums/257854-microsoft-edge-developer/suggestions/11443002-webpage-zip-as-alternative-to-pdf <https://wpdev.uservoice.com/forums/257854-microsoft-edge-developer/suggestions/11443002-webpage-zip-as-alternative-to-pdf>
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>>
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>> ----
>>>>>>>> >>>>>>>>> Ivan Herman, W3C
>>>>>>>> >>>>>>>>> Digital Publishing Lead
>>>>>>>> >>>>>>>>> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>>>>>>>> >>>>>>>>> mobile: +31-641044153 <tel:%2B31-641044153>
>>>>>>>> >>>>>>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>> ----
>>>>>>>> >>>>>>> Ivan Herman, W3C
>>>>>>>> >>>>>>> Digital Publishing Lead
>>>>>>>> >>>>>>> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>>>>>>>> >>>>>>> mobile: +31-641044153 <tel:%2B31-641044153>
>>>>>>>> >>>>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>>
>>>>>>>> >>>>> ----
>>>>>>>> >>>>> Ivan Herman, W3C
>>>>>>>> >>>>> Digital Publishing Lead
>>>>>>>> >>>>> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>>>>>>>> >>>>> mobile: +31-641044153 <tel:%2B31-641044153>
>>>>>>>> >>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> >>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> - Nick Ruffilo
>>>>>>>> @NickRuffilo
>>>>>>>> Aer.io <http://aer.io/> an INGRAM company
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> - Nick Ruffilo
>>>>>>> @NickRuffilo
>>>>>>> Aer.io <http://aer.io/> an INGRAM company
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> - Nick Ruffilo
>>>>>> @NickRuffilo
>>>>>> Aer.io <http://aer.io/> an INGRAM company
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> - Nick Ruffilo
>>>>> @NickRuffilo
>>>>> Aer.io <http://aer.io/> an INGRAM company
>>>>> 
>>>> 
>>> 
>> 
> 


Received on Wednesday, 27 January 2016 11:25:43 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:36:22 UTC