W3C home > Mailing lists > Public > public-digipub-ig@w3.org > January 2016

Re: [Moderator Action] [Moderator Action] Proposal: PDF alternative using HTML (ZIP/GZIP)

From: Nick Ruffilo <nickruffilo@gmail.com>
Date: Tue, 19 Jan 2016 09:39:08 -0500
Message-ID: <CA+Dds5-TiscJMUH0dHOgv+x2Pqye6BKK0NA0jKfUwAMP3c8KJw@mail.gmail.com>
To: Craig Francis <craig.francis@gmail.com>
Cc: Leonard Rosenthol <lrosenth@adobe.com>, Ivan Herman <ivan@w3.org>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Craig,

These are great questions, and I hope I can address some of them.  First
off - PWP - like any potential document format - is not aimed at solving
all possible use cases, nor should it.  That said, we also realize that
there is potentially a gap in what software capabilities are today and what
might be needed for a high-quality PWP to function as smoothly as a PDF
would today.

To speak to your specific case - the PDF sales report.  Using today's
technology, you could export that sales report as an HTML file, attach
that, and open that in your browser.  It can be archived, the local copy
can only be changed by the user, etc,  What is not yet native in most
browsers is the ability to have a package of HTML files.

For the case of a completely offline file - something more static - PWP
completely allows for that, as long as the package is created referencing
static files that can be grabbed when making the offline package.  That is
completely within scope and a use case that has been considered. PWP does
go one step further and let you have files that reference external
resources.  This would let you keep data charts up-to-date, Make quick
updates to color schemes, or pretty much anything else you may want to
update.  This is a feature - and optional.

>From my perspective - the goal for PWP is to create a package format that
makes sense for the future.  PDF has specific use cases where it is amazing
- it has had many years to be adopted and honed.  Outside of those use
cases,  PWP hopes to cover many things that PDF does not do.  That doesn't
mean that PDF will be useless, as I imagine businesses will be exporting
sales reports in PDF for the next 10 years (the same way people are still
using CSV when there is XLSX format...)  But I believe that PWP aims to be
a more versatile format than PDF which is it's differentiation.

-Nick

On Tue, Jan 19, 2016 at 7:29 AM, Craig Francis <craig.francis@gmail.com>
wrote:

> On 18 Jan 2016, at 20:42, Leonard Rosenthol <lrosenth@adobe.com> wrote:
>
> > Actually, Ivan is pointing out that an active work project - called PWP
>
>
>
>
> Hi Leonard,
>
> And yes, good point, I completely mixed up the EUPB3 and PWP (Portable Web
> Publication):
>
> http://www.w3.org/TR/pwp
>
> I've just read though the PWP Working Draft, and have some notes below.
>
> In summary, I think it's a good idea, but I'm not sure it really focuses
> on the same problem (but please let me know if I've misunderstood).
>
> Craig
>
>
>
>
>
> Just to set the tone, people like to receive PDF's for documents (e.g.
> sales reports) because they can be treated as an atomic document, that
> isn't really editable (unlike an email), and can be saved for archivable
> purposes (with no reliance on a website to be available to view it).
>
> Another example is someone who sees a webpage with some useful content,
> and they want a copy of that content on their local computer (aka "Save Web
> Page as"), so that they don't need to rely on an internet connection, for
> the website to remain available (or being able to find the page again), or
> the content on that page to change.
>
> Now there are defiantly some similarities to the problems we are trying to
> address, with the main focus for me being the archive format:
>
> https://www.w3.org/TR/pwp/#package
>
> But this seems to be a very general spec, with options to have the content
> unpackaged and delivered over the internet (rather than just a single file):
>
> https://www.w3.org/TR/pwp/#state_definition
>
> In contrast, the spec seems to not really focus on being a file that can
> be passed around/archived (e.g. emailing a PDF), but instead a central
> resource which allows for copies of the document to be downloaded.
>
> https://www.w3.org/TR/pwp/#identification
>
> This is useful if you want to have a central location for a document, and
> is kept up to date, but not so good if the primary purpose is really to
> have a copy that is created at one point in time, where the person who
> receives a copy will know that at it will stay as-is (read only).
>
> This setup seems to be confirmed in the security section:
>
> https://www.w3.org/TR/pwp/#security-models
>
> So if I was to send a report to a manager with sales figures, they will
> want to open it on their mobile phone (a quick read before bedtime, I
> assume), then later save it to their desktop computer so they can compare
> it later to the next months report.
>
> So when the Working Draft mentions things like JavaScript Service Workers:
>
> https://www.w3.org/TR/pwp/#arch
>
> And the concept of these documents having the ability to do things
> (presumably allowing the content to change, perform tracking, etc), I don't
> think it's fundamentally the right approach to this problem.
>
> But don't get me wrong, Portable Web Publications would be very good for
> Publications... I just don't think many businesses use PDF attachments in
> that way.
>
> :-)
>
>
>
>
>
> > On 18 Jan 2016, at 20:42, Leonard Rosenthol <lrosenth@adobe.com> wrote:
> >
> > Actually, Ivan is pointing out that an active work project - called PWP
> (Portable Web Publication - to address the need for having a better way to
> publish content using web technologies both in a packaged and unpackaged
> form.
> >
> > A solution that aligns with EPUB (but would not be EPUB 3.x as we know
> it today) is certainly something being serious considered by various folks
> as part of this work.
> >
> > Leonard
> >
> >
> >
> > On 1/18/16, 12:26 PM, "Craig Francis" <craig.francis@gmail.com> wrote:
> >
> >> On 18 Jan 2016, at 17:13, Leonard Rosenthol <lrosenth@adobe.com> wrote:
> >>> So that a user browsing PDFs on the web doesn’t need anything extra.
> >>
> >>
> >>
> >>
> >> I think Ivan is suggesting that EPUB3 might do the same.
> >>
> >> I'm still not 100% convinced how well it will work (as this does depend
> heavily on the OS, and browsers).
> >>
> >> But in both cases (EPUB3, or using a ZIP to wrap up the HTML
> document+assets) most of the building blocks are already in place.
> >>
> >> Craig
> >>
> >>
> >>
> >>
> >>
> >>
> >>> On 18 Jan 2016, at 17:13, Leonard Rosenthol <lrosenth@adobe.com>
> wrote:
> >>>
> >>> While a PDF file does need a “reader”, it should be pointed out that
> EVERY MAJOR browser (Safari, Chrome, Edge, FireFox) all include PDF viewing
> natively.  So that a user browsing PDFs on the web doesn’t need anything
> extra.
> >>>
> >>> Leonard
> >>>
> >>>
> >>>
> >>>
> >>> On 1/18/16, 11:43 AM, "Craig Francis" <craig.francis@gmail.com> wrote:
> >>>
> >>>> On 18 Jan 2016, at 16:13, Ivan Herman <ivan@w3.org> wrote:
> >>>>
> >>>>> Yeah. That will take time. On MacOS (starting from, I believe,
> Mavericks) the system comes with an epub reader, so files of this kind are
> automatically opened much like PDF files. Yes, it is an ebook reader on the
> OS, but that is not much different than using a PDF reader.
> >>>>>
> >>>>> To be incorporated into browsers is a big step (and would be a big
> step forward) which will need additional spec work. We are kept busy:-)
> >>>>
> >>>>
> >>>>
> >>>> Good to know, and good point about PDF files needing a reader.
> >>>>
> >>>> If I could push the format in any way (more so how the software
> works), I would like to be able to send a document that is opened, read,
> and closed without it being imported into some kind of library.
> >>>>
> >>>> Maybe some ability for email clients to open the file for a "quick
> look" (as per the OSX term), then optionally import.
> >>>>
> >>>> But I realise this is going away from the idea of using this format
> primarily for books.
> >>>>
> >>>> Anyway, thanks for the heads up.
> >>>>
> >>>> Craig
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> On 18 Jan 2016, at 16:13, Ivan Herman <ivan@w3.org> wrote:
> >>>>>
> >>>>>>
> >>>>>> On 18 Jan 2016, at 16:58, Craig Francis <craig.francis@gmail.com>
> wrote:
> >>>>>>
> >>>>>> Hi Ivan,
> >>>>>>
> >>>>>> Just to follow up on this, I've been reading the spec at:
> >>>>>>
> >>>>>> http://www.idpf.org/epub/30/spec/epub30-overview.html
> >>>>>>
> >>>>>> And it does seem pretty much what I'm after.
> >>>>>>
> >>>>>> I'm not sure I like the extra meta files, but maybe they are useful
> (e.g. the possibility of containing multiple HTML documents, one for each
> language).
> >>>>>>
> >>>>>
> >>>>> For example. A book may also consists of many chapters each in their
> individual files and the order is not clear. Etc.
> >>>>>
> >>>>>> So really the only remaining problem is getting email clients,
> browsers, OS'es to be able to open these files quickly/easily... rather
> than just automatically importing the file into an ebook reader.
> >>>>>
> >>>>> Yeah. That will take time. On MacOS (starting from, I believe,
> Mavericks) the system comes with an epub reader, so files of this kind are
> automatically opened much like PDF files. Yes, it is an ebook reader on the
> OS, but that is not much different than using a PDF reader.
> >>>>>
> >>>>> To be incorporated into browsers is a big step (and would be a big
> step forward) which will need additional spec work. We are kept busy:-)
> >>>>>
> >>>>> Cheers
> >>>>>
> >>>>> Ivan
> >>>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Craig
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On 14 Jan 2016, at 11:17, Ivan Herman <ivan@w3.org> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>> On 14 Jan 2016, at 12:05, Craig Francis <craig@craigfrancis.co.uk>
> wrote:
> >>>>>>>>
> >>>>>>>> Thanks Ivan,
> >>>>>>>>
> >>>>>>>> You are right, I normally focus more on security side of things.
> >>>>>>>>
> >>>>>>>> But out of interest, EPUB3, is that likely to get the same
> integration as how PDFs work at the moment?
> >>>>>>>>
> >>>>>>>> As in, you can email someone an EPUB3 file, and the recipient can
> click/tap on it to quickly view in their email client?
> >>>>>>>>
> >>>>>>>> Or simply have the web browser open it, rather than needing a
> dedicated EPUB3 reader?
> >>>>>>>
> >>>>>>> In theory, all this is possible but the infrastructure is not as
> widespread as for PDF. Eg, you need extensions for Firefox to open an epub
> directly.
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> So far I've really only considered EPUB as more of a format for
> books (which is probably my lack of understanding of the format), so I've
> never really thought of its use for reports, leaflets, etc (i.e. things
> that PDF's tend to be used for).
> >>>>>>>>
> >>>>>>>
> >>>>>>> EPUB is perfectly capable of handling that out of the box.
> >>>>>>>
> >>>>>>> Ivan
> >>>>>>>
> >>>>>>>
> >>>>>>>> In the mean time I'll have a read up on the PWP group.
> >>>>>>>>
> >>>>>>>> Craig
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On 14 Jan 2016, at 10:52, Ivan Herman <ivan@w3.org> wrote:
> >>>>>>>>>
> >>>>>>>>> Craig,
> >>>>>>>>>
> >>>>>>>>> thanks for your note. Two comments:
> >>>>>>>>>
> >>>>>>>>> - The format EPUB3, defined by IDPF, already does many of what
> you say. On a very high level, it takes a (slightly constrained) Web site
> and puts it into, essentially, a zip file. For many applications, this is a
> worthy replacement for PDF. Note that almost all the electronic books you
> buy today are in EPUB3 or its predecessor...
> >>>>>>>>>
> >>>>>>>>> - The DPUB IG also looks further down the line on a stronger
> integration of digital publishing and the OWP:
> >>>>>>>>>
> >>>>>>>>> http://www.w3.org/TR/pwp
> >>>>>>>>>
> >>>>>>>>> which may lead to significant changes in the future.
> >>>>>>>>>
> >>>>>>>>> Bottom line: this evolution is already happening!
> >>>>>>>>>
> >>>>>>>>> I understand you come more from the security area; there may be
> security issues with EPUB3 or PWP which we do not fully appreciate, so any
> comment is welcome of course!
> >>>>>>>>>
> >>>>>>>>> Cheers
> >>>>>>>>>
> >>>>>>>>> Ivan
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On 14 Jan 2016, at 11:34, Craig Francis <
> craig@craigfrancis.co.uk> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> Recently I've been thinking of some of the problems with PDF's,
> which are useful for creating a document that can be archived, emailed,
> printed, etc.
> >>>>>>>>>>
> >>>>>>>>>> HTML has solutions for many of PDF's problems though, for
> example structured text (accessibility), ability to change layout depending
> on screen size (no need for small screen devices to zoom into a fixed A4
> layout), can change font size, better indexing support (searching for
> documents), etc.
> >>>>>>>>>>
> >>>>>>>>>> Unfortunately you can't just email a HTML document to someone,
> as this causes a range of security problems, and including resources can be
> difficult (you can inline them, or use MHTML, but these are tricky to
> create).
> >>>>>>>>>>
> >>>>>>>>>> So I was wondering if we could take the approach that Microsoft
> Word did with the docx format, Java with JAR, PHP with PHAR, etc...
> >>>>>>>>>>
> >>>>>>>>>> Have a new file format, associated with the browser, which is
> just a ZIP/GZIP file that contains an index.html file, and everything else
> needed for the document.
> >>>>>>>>>>
> >>>>>>>>>> Then from a security point of view, it can be locked down to
> its own little box, so no access to other files on the file system,
> probably no access to cookies/localstorage, no ability to connect to
> another host.
> >>>>>>>>>>
> >>>>>>>>>> And from the users point of view, the document could be
> protected with a password (a feature that ZIP/GZIP provides already, and
> the browser can prompt for when opening).
> >>>>>>>>>>
> >>>>>>>>>> So would this help with the security aspects of emailing HTML
> files to people (e.g. reports), and be better than PDFs?
> >>>>>>>>>>
> >>>>>>>>>> Craig
> >>>>>>>>>>
> >>>>>>>>>> ---
> >>>>>>>>>>
> >>>>>>>>>>
> https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0063.html
> >>>>>>>>>>
> >>>>>>>>>> https://code.google.com/p/chromium/issues/detail?id=575677
> >>>>>>>>>>
> >>>>>>>>>> https://bugzilla.mozilla.org/show_bug.cgi?id=1237990
> >>>>>>>>>>
> >>>>>>>>>>
> https://wpdev.uservoice.com/forums/257854-microsoft-edge-developer/suggestions/11443002-webpage-zip-as-alternative-to-pdf
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ----
> >>>>>>>>> Ivan Herman, W3C
> >>>>>>>>> Digital Publishing Lead
> >>>>>>>>> Home: http://www.w3.org/People/Ivan/
> >>>>>>>>> mobile: +31-641044153
> >>>>>>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> ----
> >>>>>>> Ivan Herman, W3C
> >>>>>>> Digital Publishing Lead
> >>>>>>> Home: http://www.w3.org/People/Ivan/
> >>>>>>> mobile: +31-641044153
> >>>>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> ----
> >>>>> Ivan Herman, W3C
> >>>>> Digital Publishing Lead
> >>>>> Home: http://www.w3.org/People/Ivan/
> >>>>> mobile: +31-641044153
> >>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
> >>>>
> >>>>
> >>
>
>
>


-- 
- Nick Ruffilo
@NickRuffilo
Aer.io an *INGRAM* company
Received on Tuesday, 19 January 2016 14:39:39 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:36:22 UTC