W3C home > Mailing lists > Public > public-digipub-ig@w3.org > January 2016

Re: Proposal: PDF alternative using HTML (ZIP/GZIP)

From: Nick Ruffilo <nickruffilo@gmail.com>
Date: Tue, 26 Jan 2016 15:19:28 -0500
Message-ID: <CA+Dds58ZQeP1m_F5s3oT0GomtHjagNmy45RFZ=af3+6fpjG55w@mail.gmail.com>
To: Mike Perlman <perlmanm@me.com>
Cc: Craig Francis <craig.francis@gmail.com>, Leonard Rosenthol <lrosenth@adobe.com>, Ivan Herman <ivan@w3.org>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Worth noting - as I understand PWP - we are not forcing anyone to USE
external links for files.  So, if an organization wanted to say: "for all
our PWPs, we must include all resources" they could 100% do that and
everything would work just fine.

Change is hard - change takes time - but I believe it's for the benefit of
all to create a format that will have versatility and great future
potential even if it means some adoption pains and push against the way
things are done today...

-Nick

On Tue, Jan 26, 2016 at 3:03 PM, Mike Perlman <perlmanm@me.com> wrote:

> Hi Craig
>
> Initially I was working on an EPUB3 authoring system and I looked at
> several existing WordPress plugins, so outputting into a zip of separate
> files would just be a matter of re-coding.
> The process itself is just different not more complex. Media wouldn’t have
> to be encoded LOL although encoding is kind of cool and mostly happens on
> the fly - apart from media  in CSS (which could happen on the fly) and
> fonts.
>
> Still, I was looking for a better user experience and users love to double
> click on the download and see the content without any fuss.
>
> The issue of external links to media - video or audio for example - is a
> matter of context. Some folks would hate this while others might need it as
> a requirement.
>
> BTW speaking of presentations - I have a reveal.js sample -
> http://samples.5doc.org/revealjs/modular-front-end/
>
> Cheers
> Mike
>
>
> On 26 Jan 2016, at 07:54, Craig Francis <craig.francis@gmail.com> wrote:
>
> Hi Mike,
>
> Likewise I'm not in the W3C, so I can't speak on their behalf.
>
> I am impressed that you have managed to make something that can inline all
> of the resources like that.
>
> Out of curiosity (and because I seem to be a bit obsessed with putting
> HTML into a ZIP at the moment), would development be made easier if you
> could keep those resources separate?
>
> As in, you could create a ZIP with many files in them, where there is an
> index.html and separate css/js/img/video/html files?
>
> There are also other advantages with keeping them as separate files within
> a ZIP, for example, if the JavaScript file managed to contain the string
> </script>, which hadn't been escaped :-)
>
> This setup would also allow you to have multiple HTML files, which I
> suspect is the same for ePub and PWP.
>
> That said, I personally wouldn't want the document needing to (or being
> able to) download additional resources (well, at least for what I want to
> use this for).
>
> For example, if you sent a presentation to someone, they receive the file,
> put it on a USB stick, but when going to present, they do not have an
> internet connection to download the additional resources.
>
> As to why I wouldn't like it to be *possible*, it's because you could
> track how often the document is opened, which browser is used (User Agent),
> or maybe replace content from an external source (which for my examples,
> such as legal documents, really isn't a good thing).
>
> Craig
>
>
>
>
>
> On 26 Jan 2016, at 17:19, Mike Perlman <perlmanm@me.com> wrote:
>
> Hi all
>
> I am not in the W3C, but Ivan let me know I could follow the list and
> respond directly to recipients.
> I apologize in advance if this is an affront, but there is a world beyond
> the W3C’s virtual walls where people are doing things about PWP.
>
> I am the developer of 5doc.org and I have come up with my own vision of
> PWP - both as a downloadable web page and as an authored document for
> offline use.
>
> I have several different samples on the site. Hit the 5DOC button on a
> sample and the PWP is created in real time and downloaded.
>
> The biggest differences between what I am doing and the W3C - and
> discussed in this thread - are the following:
>
> 1. In a 5DOC, all the content is stuffed into one file -  text, CSS,
> Javascript, SVG, fonts, media. Of course, large media could be sourced from
> the web.
>
> 2. The downloaded document works in a browser. No reader necessary.
>
> 3. My samples work now, today, with a few rough edges, but they work.
>
> 4. I am using WordPress for authoring. Thus the creation of a 5DOC can be
> a group effort.
>
>
> Nevertheless I have similar issues to what I have seen discussed:
>
> 1. It would be great if the browser could open/read compressed documents.
> We need any result possible ASAP and over time the kinks can be worked out.
>
> 2. Internal links. For complex documents I am using fullpage.js which uses
> conventional internal links o navigate to chapters and sections. A TOC to
> subheads doesn’t work. I imagine this problem would appear for other
> Javascript code that creates virtual spaces.
>
> One final comment: there should be little to no restrictions for what a
> PWP can be and as little to no requirements as well.
> Anything goes, especially Javascript.
>
> Cheers
> Mike
>
>
>
> On 26 Jan 2016, at 05:04, Craig Francis <craig.francis@gmail.com> wrote:
>
> Thanks for the reply Nick,
>
> And please don't get me wrong, I certainly like the flexibility of what
> you're proposing.
>
> Hopefully my previous email to Leonard might explain why I think we might
> need two (very similar) formats... maybe.
>
> Craig
>
>
>
>
> On 26 Jan 2016, at 15:15, Nick Ruffilo <nickruffilo@gmail.com> wrote:
>
> Craig,
>
> Great points.  I'm going to dive into the crazy waters here a bit - and
> this is aimed at everyone for us to think a bit uniquely here:
>
> There will always be a struggle between having a high quality packaged
> content and users.  How can you allow an external call without the
> potential for malicious content being injected and when an ignorant user
> runs that - things go bad.  While this is a very real case, I think it is a
> great determent to the quality of the format to prevent ALL code simply to
> protect that case. Cookies, storage, javascript, etc are things that can
> create huge value (imagine a textbook where you could save your answers and
> have your quizzes be dynamically graded - and all of that saved with the
> copy of the book!)
>
> We will always have to rely on external security applications to help us
> police.  Otherwise we might as well just print everything out - there's no
> viruses on paper.  Simply because a format as the ability to be mutable
> doesn't mean that every file sent using that format will be edited or
> should be edited.  I don't believe there is a case or need to make PWP
> inflexible and uneditable as PDFs are today (but lets be honest, a pro copy
> of Acrobat Reader makes any PDF editable...)
>
> I believe we also talked about scope - what is PWP's responsibility, and
> what is the responsibility of the reading system.  If I were developing a
> mail app, I would open every PWP in a complete sandbox environment and
> limit access to external files and such.  I'd include a warning that says:
> "Certain features may be disabled in this document.  If you wish to see it
> in full, download it and run it in the app of your choice.  For 99% of
> PWPs, it wouldn't change a thing.  In the few that required storage or
> other files, oh well, the user can download and read it outside of my app.
>
> I just think - especially from the perspective of PWP - doing anything to
> try to stop malicious code - beyond what is already available & part of the
> technologies we're using - will greatly limit the potential of what could
> be created with PWPs.
>
> -Nick
>
>
>
> On Tue, Jan 26, 2016 at 7:42 AM, Craig Francis <craig.francis@gmail.com>
> wrote:
>
>> Hi Nick,
>>
>> Personally I think the ability for PDF and MS Office files being able to
>> do scripting like they do is a major problem.
>>
>> This is because it's done in such a way that it can be very
>> dangerous/problematic (e.g. referencing other files on the computer, and
>> making network requests), which is why the user needs to be prompted.
>>
>> And we all know how well users respond to the "are you sure?" prompt (aka
>> the "press [yes] to make this thing I don't understand work").
>>
>> When it comes to these kinds of documents (i.e. reports), I would like it
>> so that virus/spam scanners did not need to understand and inspect every
>> file, just to guess if it's malicious.
>>
>> Which is why IT departments would like to block MS Office documents if
>> they could, but instead have to rely on virus/spam filters.
>>
>> Ideally this HTML+ZIP+Sandbox proposal would result in a file format that
>> can be freely emailed without any concerns about the security implications.
>>
>> Now I realise that JavaScript is typically seen as necessary (e.g.
>> interactive content), but if we take the mechanisms already in place in
>> browsers to support CSP (Content Security Policy), and the work being done
>> to make sub-origins work, we could still allow JavaScript within the
>> document, and block everything else (e.g. outbound requests, cookies, local
>> storage, etc).
>>
>> So in regards to PWP, this security approach seems to be basically the
>> same (which is good to hear). But as mentioned before, there are times when
>> you want to simply receive a document that you know cannot be changed.
>>
>> Which is why I think both of these solutions would work side by side.
>>
>> And yes, they both use open web technology, but shouldn't be given any
>> more access than they actually need :-)
>>
>> Craig
>>
>>
>>
>>
>>
>> On 22 Jan 2016, at 17:58, Nick Ruffilo <nickruffilo@gmail.com> wrote:
>>
>> Craig,
>>
>> Lets nail down exactly why the PWP wouldn't work for that situation.
>> Currently PDF does allow you some "scripting" but before it runs, the user
>> is prompted: "this PDF has scripting, do you wish to turn it on"  Would
>> something like that (the choice of the reading system) suffice?
>>
>> Additionally, it is my understanding that the HTML and Javascript would
>> be in a sandbox environment, and have limited access (if any) to manipulate
>> external files.  It would be the reading system's responsibility to feed
>> any data that the PWP would require externally.  So the security issues
>> then lay outside of the PWP itself, and more in the reading system -
>> something that PWP could possibly address as a note to implementors...
>>
>> As a note - pretty much any MS Office file can have scripting in it, and
>> can actually manipulate files on the filesystem (there are viruses written
>> in word and excel).  Because of this, Microsoft warns you before you run a
>> script in these formats.  This hasn't stopped business in any way (or IT)
>> from trusting the storage and download of such files.
>>
>> My understanding is that even though the contents are HTML - this is not
>> to be thought of as the "open web" but a package format that uses all of
>> the open web technology.
>>
>> -Nick
>>
>> On Fri, Jan 22, 2016 at 12:12 PM, Craig Francis <craig.francis@gmail.com>
>> wrote:
>>
>>> Hi Nick,
>>>
>>> Yes, I certainly like the ideas behind PWP, and I'm glad to see this is
>>> happening.
>>>
>>> I just don't think it works for the original proposal, which is an
>>> alternative to PDF's, having all the benefits of HTML, but still remaining
>>> read-only files that can be emailed, and IT Departments can trust being on
>>> their computers (ref the security restrictions that can applied).
>>>
>>> Craig
>>>
>>>
>>>
>>>
>>> On 21 Jan 2016, at 14:16, Nick Ruffilo <nickruffilo@gmail.com> wrote:
>>>
>>> Craig,
>>>
>>> To your point of PWP being a format that has an interaction with a
>>> server - I don't disagree, but I think that's only 1 of the two main use
>>> cases for PWP.  One of those cases is to be able to be a quality container
>>> for ebooks.  Ebooks are expected to be read in an offline mode on devices
>>> that may not have any connectivity to the internet.  In these cases, online
>>> is simply not an option - therefore the PWP must work in a 100% offline
>>> mode.  The content creator ultimately has the choice to build their PWP the
>>> way they see fit.
>>>
>>> I imagine a significant majority of PWPs created will be "offline"
>>> assuming that popular word processors adopt it as a format.  Mainly because
>>> of the business case you brought up - an employee generating an
>>> offline-mode file for sharing and archival purposes.  But, there will be
>>> many use cases where an updateable, benefiting-from-access-to-the-internet
>>> document format is superior.
>>>
>>> -Nick
>>>
>>>
>>>
>>> On Thu, Jan 21, 2016 at 7:02 AM, Craig Francis <craig.francis@gmail.com>
>>> wrote:
>>>
>>>> Hi Nick,
>>>>
>>>> I'm glad to see that you're not trying to dilute PWP with too many use
>>>> cases.
>>>>
>>>> With your comment about exporting it as a HTML file, and emailing that,
>>>> this is where the problems currently lie, and why I'm making this proposal.
>>>>
>>>> I'm not sure which mailing lists you are subscribed to, but in summary,
>>>> a HTML file on its own is a big security problem, and it's difficult to
>>>> include resources (in terms of development time/tooling)... for more info,
>>>> please see:
>>>>
>>>> https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0090.html
>>>>
>>>> https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0089.html
>>>>
>>>> In regards to PWP, I feel that it is a good idea, and defiantly has its
>>>> use cases.
>>>>
>>>> But I suspect that file format PWP becomes to be known as, will be seen
>>>> as something that has an interaction with a server, and allows for the
>>>> document to be updated.
>>>>
>>>> That defiantly has its uses, but as with PDF's, there are cases where
>>>> it's good to know that the file sent cannot change, or communicate with an
>>>> external server for any reason (instead its seen as being locked down, in a
>>>> read only state, via a sand box that the browser provides).
>>>>
>>>> So where you see PWP being a more versatile format than PDF, that is
>>>> good, but I believe we also need a second branch which takes some of the
>>>> strengths of PDF, and uses existing technology to fix some of its problems
>>>> (which I hope my previous emails explain, but I am happy to discuss if not).
>>>>
>>>> Craig
>>>>
>>>>
>>>>
>>>>
>>>> On 19 Jan 2016, at 14:39, Nick Ruffilo <nickruffilo@gmail.com> wrote:
>>>>
>>>> Craig,
>>>>
>>>> These are great questions, and I hope I can address some of them.
>>>> First off - PWP - like any potential document format - is not aimed at
>>>> solving all possible use cases, nor should it.  That said, we also realize
>>>> that there is potentially a gap in what software capabilities are today and
>>>> what might be needed for a high-quality PWP to function as smoothly as a
>>>> PDF would today.
>>>>
>>>> To speak to your specific case - the PDF sales report.  Using today's
>>>> technology, you could export that sales report as an HTML file, attach
>>>> that, and open that in your browser.  It can be archived, the local copy
>>>> can only be changed by the user, etc,  What is not yet native in most
>>>> browsers is the ability to have a package of HTML files.
>>>>
>>>> For the case of a completely offline file - something more static - PWP
>>>> completely allows for that, as long as the package is created referencing
>>>> static files that can be grabbed when making the offline package.  That is
>>>> completely within scope and a use case that has been considered. PWP does
>>>> go one step further and let you have files that reference external
>>>> resources.  This would let you keep data charts up-to-date, Make quick
>>>> updates to color schemes, or pretty much anything else you may want to
>>>> update.  This is a feature - and optional.
>>>>
>>>> From my perspective - the goal for PWP is to create a package format
>>>> that makes sense for the future.  PDF has specific use cases where it is
>>>> amazing - it has had many years to be adopted and honed.  Outside of those
>>>> use cases,  PWP hopes to cover many things that PDF does not do.  That
>>>> doesn't mean that PDF will be useless, as I imagine businesses will be
>>>> exporting sales reports in PDF for the next 10 years (the same way people
>>>> are still using CSV when there is XLSX format...)  But I believe that PWP
>>>> aims to be a more versatile format than PDF which is it's differentiation.
>>>>
>>>> -Nick
>>>>
>>>> On Tue, Jan 19, 2016 at 7:29 AM, Craig Francis <craig.francis@gmail.com
>>>> > wrote:
>>>>
>>>>> On 18 Jan 2016, at 20:42, Leonard Rosenthol <lrosenth@adobe.com>
>>>>> wrote:
>>>>>
>>>>> > Actually, Ivan is pointing out that an active work project - called
>>>>> PWP
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Hi Leonard,
>>>>>
>>>>> And yes, good point, I completely mixed up the EUPB3 and PWP (Portable
>>>>> Web Publication):
>>>>>
>>>>> http://www.w3.org/TR/pwp
>>>>>
>>>>> I've just read though the PWP Working Draft, and have some notes below.
>>>>>
>>>>> In summary, I think it's a good idea, but I'm not sure it really
>>>>> focuses on the same problem (but please let me know if I've misunderstood).
>>>>>
>>>>> Craig
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Just to set the tone, people like to receive PDF's for documents (e.g.
>>>>> sales reports) because they can be treated as an atomic document, that
>>>>> isn't really editable (unlike an email), and can be saved for archivable
>>>>> purposes (with no reliance on a website to be available to view it).
>>>>>
>>>>> Another example is someone who sees a webpage with some useful
>>>>> content, and they want a copy of that content on their local computer (aka
>>>>> "Save Web Page as"), so that they don't need to rely on an internet
>>>>> connection, for the website to remain available (or being able to find the
>>>>> page again), or the content on that page to change.
>>>>>
>>>>> Now there are defiantly some similarities to the problems we are
>>>>> trying to address, with the main focus for me being the archive format:
>>>>>
>>>>> https://www.w3.org/TR/pwp/#package
>>>>>
>>>>> But this seems to be a very general spec, with options to have the
>>>>> content unpackaged and delivered over the internet (rather than just a
>>>>> single file):
>>>>>
>>>>> https://www.w3.org/TR/pwp/#state_definition
>>>>>
>>>>> In contrast, the spec seems to not really focus on being a file that
>>>>> can be passed around/archived (e.g. emailing a PDF), but instead a central
>>>>> resource which allows for copies of the document to be downloaded.
>>>>>
>>>>> https://www.w3.org/TR/pwp/#identification
>>>>>
>>>>> This is useful if you want to have a central location for a document,
>>>>> and is kept up to date, but not so good if the primary purpose is really to
>>>>> have a copy that is created at one point in time, where the person who
>>>>> receives a copy will know that at it will stay as-is (read only).
>>>>>
>>>>> This setup seems to be confirmed in the security section:
>>>>>
>>>>> https://www.w3.org/TR/pwp/#security-models
>>>>>
>>>>> So if I was to send a report to a manager with sales figures, they
>>>>> will want to open it on their mobile phone (a quick read before bedtime, I
>>>>> assume), then later save it to their desktop computer so they can compare
>>>>> it later to the next months report.
>>>>>
>>>>> So when the Working Draft mentions things like JavaScript Service
>>>>> Workers:
>>>>>
>>>>> https://www.w3.org/TR/pwp/#arch
>>>>>
>>>>> And the concept of these documents having the ability to do things
>>>>> (presumably allowing the content to change, perform tracking, etc), I don't
>>>>> think it's fundamentally the right approach to this problem.
>>>>>
>>>>> But don't get me wrong, Portable Web Publications would be very good
>>>>> for Publications... I just don't think many businesses use PDF attachments
>>>>> in that way.
>>>>>
>>>>> :-)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> > On 18 Jan 2016, at 20:42, Leonard Rosenthol <lrosenth@adobe.com>
>>>>> wrote:
>>>>> >
>>>>> > Actually, Ivan is pointing out that an active work project - called
>>>>> PWP (Portable Web Publication - to address the need for having a better way
>>>>> to publish content using web technologies both in a packaged and unpackaged
>>>>> form.
>>>>> >
>>>>> > A solution that aligns with EPUB (but would not be EPUB 3.x as we
>>>>> know it today) is certainly something being serious considered by various
>>>>> folks as part of this work.
>>>>> >
>>>>> > Leonard
>>>>> >
>>>>> >
>>>>> >
>>>>> > On 1/18/16, 12:26 PM, "Craig Francis" <craig.francis@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> >> On 18 Jan 2016, at 17:13, Leonard Rosenthol <lrosenth@adobe.com>
>>>>> wrote:
>>>>> >>> So that a user browsing PDFs on the web doesn’t need anything
>>>>> extra.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> I think Ivan is suggesting that EPUB3 might do the same.
>>>>> >>
>>>>> >> I'm still not 100% convinced how well it will work (as this does
>>>>> depend heavily on the OS, and browsers).
>>>>> >>
>>>>> >> But in both cases (EPUB3, or using a ZIP to wrap up the HTML
>>>>> document+assets) most of the building blocks are already in place.
>>>>> >>
>>>>> >> Craig
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>> On 18 Jan 2016, at 17:13, Leonard Rosenthol <lrosenth@adobe.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>> While a PDF file does need a “reader”, it should be pointed out
>>>>> that EVERY MAJOR browser (Safari, Chrome, Edge, FireFox) all include PDF
>>>>> viewing natively.  So that a user browsing PDFs on the web doesn’t need
>>>>> anything extra.
>>>>> >>>
>>>>> >>> Leonard
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> On 1/18/16, 11:43 AM, "Craig Francis" <craig.francis@gmail.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>>> On 18 Jan 2016, at 16:13, Ivan Herman <ivan@w3.org> wrote:
>>>>> >>>>
>>>>> >>>>> Yeah. That will take time. On MacOS (starting from, I believe,
>>>>> Mavericks) the system comes with an epub reader, so files of this kind are
>>>>> automatically opened much like PDF files. Yes, it is an ebook reader on the
>>>>> OS, but that is not much different than using a PDF reader.
>>>>> >>>>>
>>>>> >>>>> To be incorporated into browsers is a big step (and would be a
>>>>> big step forward) which will need additional spec work. We are kept busy:-)
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> Good to know, and good point about PDF files needing a reader.
>>>>> >>>>
>>>>> >>>> If I could push the format in any way (more so how the software
>>>>> works), I would like to be able to send a document that is opened, read,
>>>>> and closed without it being imported into some kind of library.
>>>>> >>>>
>>>>> >>>> Maybe some ability for email clients to open the file for a
>>>>> "quick look" (as per the OSX term), then optionally import.
>>>>> >>>>
>>>>> >>>> But I realise this is going away from the idea of using this
>>>>> format primarily for books.
>>>>> >>>>
>>>>> >>>> Anyway, thanks for the heads up.
>>>>> >>>>
>>>>> >>>> Craig
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>>> On 18 Jan 2016, at 16:13, Ivan Herman <ivan@w3.org> wrote:
>>>>> >>>>>
>>>>> >>>>>>
>>>>> >>>>>> On 18 Jan 2016, at 16:58, Craig Francis <
>>>>> craig.francis@gmail.com> wrote:
>>>>> >>>>>>
>>>>> >>>>>> Hi Ivan,
>>>>> >>>>>>
>>>>> >>>>>> Just to follow up on this, I've been reading the spec at:
>>>>> >>>>>>
>>>>> >>>>>> http://www.idpf.org/epub/30/spec/epub30-overview.html
>>>>> >>>>>>
>>>>> >>>>>> And it does seem pretty much what I'm after.
>>>>> >>>>>>
>>>>> >>>>>> I'm not sure I like the extra meta files, but maybe they are
>>>>> useful (e.g. the possibility of containing multiple HTML documents, one for
>>>>> each language).
>>>>> >>>>>>
>>>>> >>>>>
>>>>> >>>>> For example. A book may also consists of many chapters each in
>>>>> their individual files and the order is not clear. Etc.
>>>>> >>>>>
>>>>> >>>>>> So really the only remaining problem is getting email clients,
>>>>> browsers, OS'es to be able to open these files quickly/easily... rather
>>>>> than just automatically importing the file into an ebook reader.
>>>>> >>>>>
>>>>> >>>>> Yeah. That will take time. On MacOS (starting from, I believe,
>>>>> Mavericks) the system comes with an epub reader, so files of this kind are
>>>>> automatically opened much like PDF files. Yes, it is an ebook reader on the
>>>>> OS, but that is not much different than using a PDF reader.
>>>>> >>>>>
>>>>> >>>>> To be incorporated into browsers is a big step (and would be a
>>>>> big step forward) which will need additional spec work. We are kept busy:-)
>>>>> >>>>>
>>>>> >>>>> Cheers
>>>>> >>>>>
>>>>> >>>>> Ivan
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>>
>>>>> >>>>>> Craig
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>> On 14 Jan 2016, at 11:17, Ivan Herman <ivan@w3.org> wrote:
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>>> On 14 Jan 2016, at 12:05, Craig Francis <
>>>>> craig@craigfrancis.co.uk> wrote:
>>>>> >>>>>>>>
>>>>> >>>>>>>> Thanks Ivan,
>>>>> >>>>>>>>
>>>>> >>>>>>>> You are right, I normally focus more on security side of
>>>>> things.
>>>>> >>>>>>>>
>>>>> >>>>>>>> But out of interest, EPUB3, is that likely to get the same
>>>>> integration as how PDFs work at the moment?
>>>>> >>>>>>>>
>>>>> >>>>>>>> As in, you can email someone an EPUB3 file, and the recipient
>>>>> can click/tap on it to quickly view in their email client?
>>>>> >>>>>>>>
>>>>> >>>>>>>> Or simply have the web browser open it, rather than needing a
>>>>> dedicated EPUB3 reader?
>>>>> >>>>>>>
>>>>> >>>>>>> In theory, all this is possible but the infrastructure is not
>>>>> as widespread as for PDF. Eg, you need extensions for Firefox to open an
>>>>> epub directly.
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>> So far I've really only considered EPUB as more of a format
>>>>> for books (which is probably my lack of understanding of the format), so
>>>>> I've never really thought of its use for reports, leaflets, etc (i.e.
>>>>> things that PDF's tend to be used for).
>>>>> >>>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> EPUB is perfectly capable of handling that out of the box.
>>>>> >>>>>>>
>>>>> >>>>>>> Ivan
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>>> In the mean time I'll have a read up on the PWP group.
>>>>> >>>>>>>>
>>>>> >>>>>>>> Craig
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>>> On 14 Jan 2016, at 10:52, Ivan Herman <ivan@w3.org> wrote:
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> Craig,
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> thanks for your note. Two comments:
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> - The format EPUB3, defined by IDPF, already does many of
>>>>> what you say. On a very high level, it takes a (slightly constrained) Web
>>>>> site and puts it into, essentially, a zip file. For many applications, this
>>>>> is a worthy replacement for PDF. Note that almost all the electronic books
>>>>> you buy today are in EPUB3 or its predecessor...
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> - The DPUB IG also looks further down the line on a stronger
>>>>> integration of digital publishing and the OWP:
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> http://www.w3.org/TR/pwp
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> which may lead to significant changes in the future.
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> Bottom line: this evolution is already happening!
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> I understand you come more from the security area; there may
>>>>> be security issues with EPUB3 or PWP which we do not fully appreciate, so
>>>>> any comment is welcome of course!
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> Cheers
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> Ivan
>>>>> >>>>>>>>>
>>>>> >>>>>>>>>
>>>>> >>>>>>>>>> On 14 Jan 2016, at 11:34, Craig Francis <
>>>>> craig@craigfrancis.co.uk> wrote:
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> Hi,
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> Recently I've been thinking of some of the problems with
>>>>> PDF's, which are useful for creating a document that can be archived,
>>>>> emailed, printed, etc.
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> HTML has solutions for many of PDF's problems though, for
>>>>> example structured text (accessibility), ability to change layout depending
>>>>> on screen size (no need for small screen devices to zoom into a fixed A4
>>>>> layout), can change font size, better indexing support (searching for
>>>>> documents), etc.
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> Unfortunately you can't just email a HTML document to
>>>>> someone, as this causes a range of security problems, and including
>>>>> resources can be difficult (you can inline them, or use MHTML, but these
>>>>> are tricky to create).
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> So I was wondering if we could take the approach that
>>>>> Microsoft Word did with the docx format, Java with JAR, PHP with PHAR,
>>>>> etc...
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> Have a new file format, associated with the browser, which
>>>>> is just a ZIP/GZIP file that contains an index.html file, and everything
>>>>> else needed for the document.
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> Then from a security point of view, it can be locked down
>>>>> to its own little box, so no access to other files on the file system,
>>>>> probably no access to cookies/localstorage, no ability to connect to
>>>>> another host.
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> And from the users point of view, the document could be
>>>>> protected with a password (a feature that ZIP/GZIP provides already, and
>>>>> the browser can prompt for when opening).
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> So would this help with the security aspects of emailing
>>>>> HTML files to people (e.g. reports), and be better than PDFs?
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> Craig
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> ---
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>>
>>>>> https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0063.html
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> https://code.google.com/p/chromium/issues/detail?id=575677
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> https://bugzilla.mozilla.org/show_bug.cgi?id=1237990
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>>
>>>>> https://wpdev.uservoice.com/forums/257854-microsoft-edge-developer/suggestions/11443002-webpage-zip-as-alternative-to-pdf
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> ----
>>>>> >>>>>>>>> Ivan Herman, W3C
>>>>> >>>>>>>>> Digital Publishing Lead
>>>>> >>>>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>> >>>>>>>>> mobile: +31-641044153
>>>>> >>>>>>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>>> >>>>>>>>>
>>>>> >>>>>>>>>
>>>>> >>>>>>>>>
>>>>> >>>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>> ----
>>>>> >>>>>>> Ivan Herman, W3C
>>>>> >>>>>>> Digital Publishing Lead
>>>>> >>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>> >>>>>>> mobile: +31-641044153
>>>>> >>>>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> ----
>>>>> >>>>> Ivan Herman, W3C
>>>>> >>>>> Digital Publishing Lead
>>>>> >>>>> Home: http://www.w3.org/People/Ivan/
>>>>> >>>>> mobile: +31-641044153
>>>>> >>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>>> >>>>
>>>>> >>>>
>>>>> >>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> - Nick Ruffilo
>>>> @NickRuffilo
>>>> Aer.io <http://aer.io/> an *INGRAM* company
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> - Nick Ruffilo
>>> @NickRuffilo
>>> Aer.io <http://aer.io/> an *INGRAM* company
>>>
>>>
>>>
>>
>>
>> --
>> - Nick Ruffilo
>> @NickRuffilo
>> Aer.io <http://aer.io/> an *INGRAM* company
>>
>>
>>
>
>
> --
> - Nick Ruffilo
> @NickRuffilo
> Aer.io <http://aer.io/> an *INGRAM* company
>
>
>
>
>
>


-- 
- Nick Ruffilo
@NickRuffilo
Aer.io an *INGRAM* company
Received on Tuesday, 26 January 2016 20:20:11 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:36:22 UTC