W3C home > Mailing lists > Public > public-digipub-ig@w3.org > January 2016

Re: Proposal: PDF alternative using HTML (ZIP/GZIP)

From: Nick Ruffilo <nickruffilo@gmail.com>
Date: Wed, 27 Jan 2016 10:18:05 -0500
Message-ID: <CA+Dds5_K_xuyG085xci=kYENQ42RfGQuB0wbknvs6zDB5bwYAg@mail.gmail.com>
To: Craig Francis <craig.francis@gmail.com>
Cc: Leonard Rosenthol <lrosenth@adobe.com>, W3C Digital Publishing IG <public-digipub-ig@w3.org>, Ivan Herman <ivan@w3.org>
It's possible that PWP will not solve the use case of legal documents being
static - and I believe that's OK.  We cannot expect to have one document
format to rule them all (MY PRECIOUS!)

Also - It's probably worth a discussion on the list whether we need to have
something that works immediately or not.  My feeling is that is to
restrictive.

-Nick

On Wed, Jan 27, 2016 at 6:25 AM, Craig Francis <craig.francis@gmail.com>
wrote:

> On 26 Jan 2016, at 21:51, Leonard Rosenthol <lrosenth@adobe.com> wrote:
>
> PDF seems like a much better alternative.  (NOTE: a PDF can be 100%
> identically accessible to HTML – it just happens that authoring accessible
> HTML is easier than accessible PDF, but that’s a tool issue not a format
> issue)
>
>
>
> Hi Leonard,
>
> In regards to accessibility, PDF can be marked up to help assistive
> devices like screen readers, but it's rarely (if ever) used.
>
> Actually, I've found too many PDF documents where the authoring tool has
> placed every character individually, but that can happen in any file format
> (unfortunately more so with PDF, probably because its authoring tools only
> focus on the visual output).
>
> And there are other accessibility problems with PDF's, such as the
> inability to change the font size (simply zooming in is not enough), the
> font cannot be changed (e.g. applying the OpenDyslexic
> <http://opendyslexic.org/get-it-free/> font), the font/background colours
> cannot be changed (e.g. colour contrast), and they cannot be re-formatted
> for different screen sizes (I think this effects anyone who has tried to
> read an A4 document on a 4" screen, needing to use horizontal and vertical
> scrolling)... and in most cases you can't even copy/paste the text content
> (try doing it when the document is formatted with 2 columns of text).
>
> Anyway, accessibility in PDFs aside...
>
> Because the PWP model seems to work with the assumption of a central URL,
> and new versions can be pulled down, it will not work for legal documents.
>
> So I agree with you Leonard, PDF seems to be a better alternative to PWP
> in these situations.
>
> All I'm proposing is a much simpler (but probably similar) file format to
> PWP, using Open Web Technology (not so much the Platform), nearly all of
> which is already available in web browsers.
>
> And I believe this provides a much better alternative to PDF's... the only
> downsides is that CSS cannot currently do the "pixel perfect" recreation of
> the document (like PDF's kind of do), and that there are many existing
> programs that already work with PDF's (this includes converting to a format
> that printers can understand).
>
> This also applies to the other document formats such as MHTML and DOCX (MS
> Office), which introduce security problems, and considerable development
> complexity.
>
> Craig
>
>
>
>
> On 26 Jan 2016, at 21:51, Leonard Rosenthol <lrosenth@adobe.com> wrote:
>
> >I do feel that there is a need for a document format, as per my
> understanding of PWP, that has the ability to be updated (e.g. for
> publications).
> >But that is different to files that need to remain as atomic units, that
> remain isolated from everything else.
> >
> There is no requirement that a PWP needs to be updatable – that’s just one
> use case where it could.  At the same time, there are also clear use cases
> (such as your own) where the document/publication is “atomic” or “unique”
> and would never be modified.   And these criteria are also separate from
> others such as self-containment.
>
> Thanks for the info below – but I don’t see any advantage for HTML-based
> publications in those workflows.  You wouldn’t be leveraging anything
> specific to the Open Web Platform and its ecosystem.  PDF seems like a much
> better alternative.  (NOTE: a PDF can be 100% identically accessible to
> HTML – it just happens that authoring accessible HTML is easier than
> accessible PDF, but that’s a tool issue not a format issue)
>
> Leonard
>
> From: Craig Francis <craig.francis@gmail.com>
> Date: Tuesday, January 26, 2016 at 9:53 AM
> To: Leonard Rosenthol <lrosenth@adobe.com>
> Cc: W3C Digital Publishing IG <public-digipub-ig@w3.org>, Ivan Herman <
> ivan@w3.org>, Nick Ruffilo <nickruffilo@gmail.com>
> Subject: Re: Proposal: PDF alternative using HTML (ZIP/GZIP)
>
> On 26 Jan 2016, at 12:47, Leonard Rosenthol <lrosenth@adobe.com> wrote:
>
> PWP is designed to cover all of those use cases, as there are many uses
> for publishing content – as seen in the myriad of industries that have
> adopted PDF.
>
>
>
>
> Hi Leonard,
>
> You are probably right, and I'm just thinking about it from a programmers
> point of view (one who has to send reports).
>
> I do feel that there is a need for a document format, as per my
> understanding of PWP, that has the ability to be updated (e.g. for
> publications).
>
> But that is different to files that need to remain as atomic units, that
> remain isolated from everything else.
>
> We also need to think how these files are consumed. For example, if I send
> you an ePub file today, you will probably want to open and save it in an
> e-reader with other books. Whereas if the email contained a PDF file, it
> would be opened/read, but ultimately closed and not saved (where the email
> can be archived if it needs to be read again later).
>
> I might be going into too many specifics, but I have a few examples below
> if you're interested.
>
> Craig
>
>
>
>
>
>
> I work for a company that assess students with disabilities who are going
> to university.
>
> In the UK we have a couple of organisations, such as Student Finance
> England (SFE), who provide funding to those students, so they can the get
> the equipment or support they need.
>
> So the company I work for meet and do assessments for each student, get
> quotes from suppliers, and make recommendations as to what each student
> should have (e.g. a laptop, and note taking lessons).
>
> The report the assessor writes is currently sent to SFE as a PDF file,
> which introduces a few accessibility issues.
>
> Ideally I would instead create a HTML file, package that into a ZIP (to
> include some extra resources), and send it to SFE.
>
> But they will not open a HTML file due to the security implications (nor
> would any student who we send it to, assuming they know that the HTML file
> attachment can be opened in a web browser).
>
> Then, because SFE are so worried about the students private information,
> they actually use PGP (the zip kind) and I believe they open the PDF report
> on a computer that has extremely limited access to the internet (as in, can
> only send and receive email).
>
> So when PWP does becomes available, I doubt they will accept them,
> especially if they know that the report could be updated/changed in any way.
>
> SFE then send out a DSA2 file (which authorises the supplier to dispatch
> the items), and the supplier in turn raises an invoice for SFE to pay...
> neither of these (currently PDF) documents can be editable from a technical
> or legal point of view.
>
> Another example is the Terms and Conditions we send to the student. While
> this is a "living document" that is changed over time, the copy the student
> receives must remain the same for them.
>
> Or when we send some statistics to SFE for the number/type of assessments
> that were completed, even if we later find out that the type of one
> assessment was wrong, and is technically incorrect, that file still needs
> to record what was sent (plus a follow up report to show the corrected
> statistics).
>
> Then, with a couple of my other clients, there are still contracts that
> need to be signed, or invoices that are issued.
>
> All of these better fit the HTML+ ZIP proposal, which needs a very strict
> sandbox.
>
> Whereas with PWP that better suits:
>
> - A writer publishing a fictional story, which might contain typos to be
> corrected.
>
> - A newspaper which includes corrections, as more information is
> discovered.
>
> - An academic writing a paper, where the document can referred to by
> others by a URL.
>
> - An educational book that needs to be kept up to date with the latest
> information, and distributed from a central server.
>
> And as Nick has just pointed out, maybe these documents could have their
> own cookie store / local storage, allowing the document to record your
> notes and answers.
>
>
>
>
>
> On 26 Jan 2016, at 12:47, Leonard Rosenthol <lrosenth@adobe.com> wrote:
>
> PWP is designed to cover all of those use cases, as there are many uses
> for publishing content – as seen in the myriad of industries that have
> adopted PDF.
>
> Leonard
>
> From: Craig Francis <craig.francis@gmail.com>
> Date: Tuesday, January 26, 2016 at 7:42 AM
> To: Leonard Rosenthol <lrosenth@adobe.com>
> Subject: Re: Proposal: PDF alternative using HTML (ZIP/GZIP)
>
> Thanks for the clarification Leonard,
>
> I can certainly see the use cases for JavaScript, and glad to see you are
> considering them.
>
> Personally I would like to suggest not relying on warnings to the user (as
> they don't really understand what they mean), but I like that you are also
> considering restricting the JavaScript.
>
>
>
> Otherwise I think the proposed HTML+ZIP and PWP documents are similar
> (e.g. using HTML+CSS), but do have slight differences:
>
> PWP: Documents are kept up to date, where (temporary) offline copies can
> be made.
>
> PWP: Published from a central location, so references to it can be made
> (like saying book X from author Y).
>
> HTMl+ZIP: Copies of the document can be created, but once those copies are
> made, they remain as their own entity (typically for archival purposes).
>
> HTML+ZIP: Seen as read-only content (in as much as any computer document
> is read-only), representing a document or data at that point in time.
>
> Craig
>
>
>
>
>
>
>
>
> On 22 Jan 2016, at 19:42, Leonard Rosenthol <lrosenth@adobe.com> wrote:
>
> Nick – you should be careful to separate the file format from the reader.
> You do it well for PWP and RS, but forgot for PDF.
>
> Yes, a PDF file can contain JavaScript which are documented (according to
> the spec) to run at specific times during the load and viewing of a PDF.
> This is exactly like what JS can do with HTML, which is then what would
> happen when packaged in a PWP.   Certain subsets of PDF restrict the
> presence of scripts entirely or in limited uses – just as EPUB currently
> does as an example of a PWP.
>
> However, there are ZERO requirements (or even recommendations) in the PDF
> standard about a “conforming reader” (the PDF term for a Reading System/RS)
> providing any type of warnings about the presence (or lack thereof) for
> JavaScript.    So any such UI that might exist in your PDF conforming
> reader of choice is that application’s decision.  Other conforming readers
> can/do things differently vis-a-vis JavaScript – including some (such as
> Apple’s Preview) that completely ignore it.
>
> As for JS in PWP – I think it’s much too early to make any specific
> statements about that. We know that some forms of PWP (such as EPUB x.x)
> might choose to restrict the JS, just as it does today – but that’s a
> specific case not the general one.   Same with sandboxing, I don’t see that
> as a PWP requirement but might well exist for certain specific cases and
> implementations.
>
> Leonard
>
> From: Nick Ruffilo <nickruffilo@gmail.com>
> Date: Friday, January 22, 2016 at 12:58 PM
> To: Craig Francis <craig.francis@gmail.com>
> Cc: Leonard Rosenthol <lrosenth@adobe.com>, Ivan Herman <ivan@w3.org>,
> W3C Digital Publishing IG <public-digipub-ig@w3.org>
> Subject: Re: Proposal: PDF alternative using HTML (ZIP/GZIP)
>
> Craig,
>
> Lets nail down exactly why the PWP wouldn't work for that situation.
> Currently PDF does allow you some "scripting" but before it runs, the user
> is prompted: "this PDF has scripting, do you wish to turn it on"  Would
> something like that (the choice of the reading system) suffice?
>
> Additionally, it is my understanding that the HTML and Javascript would be
> in a sandbox environment, and have limited access (if any) to manipulate
> external files.  It would be the reading system's responsibility to feed
> any data that the PWP would require externally.  So the security issues
> then lay outside of the PWP itself, and more in the reading system -
> something that PWP could possibly address as a note to implementors...
>
> As a note - pretty much any MS Office file can have scripting in it, and
> can actually manipulate files on the filesystem (there are viruses written
> in word and excel).  Because of this, Microsoft warns you before you run a
> script in these formats.  This hasn't stopped business in any way (or IT)
> from trusting the storage and download of such files.
>
> My understanding is that even though the contents are HTML - this is not
> to be thought of as the "open web" but a package format that uses all of
> the open web technology.
>
> -Nick
>
> On Fri, Jan 22, 2016 at 12:12 PM, Craig Francis <craig.francis@gmail.com>
> wrote:
>
>> Hi Nick,
>>
>> Yes, I certainly like the ideas behind PWP, and I'm glad to see this is
>> happening.
>>
>> I just don't think it works for the original proposal, which is an
>> alternative to PDF's, having all the benefits of HTML, but still remaining
>> read-only files that can be emailed, and IT Departments can trust being on
>> their computers (ref the security restrictions that can applied).
>>
>> Craig
>>
>>
>>
>>
>> On 21 Jan 2016, at 14:16, Nick Ruffilo <nickruffilo@gmail.com> wrote:
>>
>> Craig,
>>
>> To your point of PWP being a format that has an interaction with a server
>> - I don't disagree, but I think that's only 1 of the two main use cases for
>> PWP.  One of those cases is to be able to be a quality container for
>> ebooks.  Ebooks are expected to be read in an offline mode on devices that
>> may not have any connectivity to the internet.  In these cases, online is
>> simply not an option - therefore the PWP must work in a 100% offline mode.
>> The content creator ultimately has the choice to build their PWP the way
>> they see fit.
>>
>> I imagine a significant majority of PWPs created will be "offline"
>> assuming that popular word processors adopt it as a format.  Mainly because
>> of the business case you brought up - an employee generating an
>> offline-mode file for sharing and archival purposes.  But, there will be
>> many use cases where an updateable, benefiting-from-access-to-the-internet
>> document format is superior.
>>
>> -Nick
>>
>>
>>
>> On Thu, Jan 21, 2016 at 7:02 AM, Craig Francis <craig.francis@gmail.com>
>> wrote:
>>
>>> Hi Nick,
>>>
>>> I'm glad to see that you're not trying to dilute PWP with too many use
>>> cases.
>>>
>>> With your comment about exporting it as a HTML file, and emailing that,
>>> this is where the problems currently lie, and why I'm making this proposal.
>>>
>>> I'm not sure which mailing lists you are subscribed to, but in summary,
>>> a HTML file on its own is a big security problem, and it's difficult to
>>> include resources (in terms of development time/tooling)... for more info,
>>> please see:
>>>
>>> https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0090.html
>>>
>>> https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0089.html
>>>
>>> In regards to PWP, I feel that it is a good idea, and defiantly has its
>>> use cases.
>>>
>>> But I suspect that file format PWP becomes to be known as, will be seen
>>> as something that has an interaction with a server, and allows for the
>>> document to be updated.
>>>
>>> That defiantly has its uses, but as with PDF's, there are cases where
>>> it's good to know that the file sent cannot change, or communicate with an
>>> external server for any reason (instead its seen as being locked down, in a
>>> read only state, via a sand box that the browser provides).
>>>
>>> So where you see PWP being a more versatile format than PDF, that is
>>> good, but I believe we also need a second branch which takes some of the
>>> strengths of PDF, and uses existing technology to fix some of its problems
>>> (which I hope my previous emails explain, but I am happy to discuss if not).
>>>
>>> Craig
>>>
>>>
>>>
>>>
>>> On 19 Jan 2016, at 14:39, Nick Ruffilo <nickruffilo@gmail.com> wrote:
>>>
>>> Craig,
>>>
>>> These are great questions, and I hope I can address some of them.  First
>>> off - PWP - like any potential document format - is not aimed at solving
>>> all possible use cases, nor should it.  That said, we also realize that
>>> there is potentially a gap in what software capabilities are today and what
>>> might be needed for a high-quality PWP to function as smoothly as a PDF
>>> would today.
>>>
>>> To speak to your specific case - the PDF sales report.  Using today's
>>> technology, you could export that sales report as an HTML file, attach
>>> that, and open that in your browser.  It can be archived, the local copy
>>> can only be changed by the user, etc,  What is not yet native in most
>>> browsers is the ability to have a package of HTML files.
>>>
>>> For the case of a completely offline file - something more static - PWP
>>> completely allows for that, as long as the package is created referencing
>>> static files that can be grabbed when making the offline package.  That is
>>> completely within scope and a use case that has been considered. PWP does
>>> go one step further and let you have files that reference external
>>> resources.  This would let you keep data charts up-to-date, Make quick
>>> updates to color schemes, or pretty much anything else you may want to
>>> update.  This is a feature - and optional.
>>>
>>> From my perspective - the goal for PWP is to create a package format
>>> that makes sense for the future.  PDF has specific use cases where it is
>>> amazing - it has had many years to be adopted and honed.  Outside of those
>>> use cases,  PWP hopes to cover many things that PDF does not do.  That
>>> doesn't mean that PDF will be useless, as I imagine businesses will be
>>> exporting sales reports in PDF for the next 10 years (the same way people
>>> are still using CSV when there is XLSX format...)  But I believe that PWP
>>> aims to be a more versatile format than PDF which is it's differentiation.
>>>
>>> -Nick
>>>
>>> On Tue, Jan 19, 2016 at 7:29 AM, Craig Francis <craig.francis@gmail.com>
>>> wrote:
>>>
>>>> On 18 Jan 2016, at 20:42, Leonard Rosenthol <lrosenth@adobe.com> wrote:
>>>>
>>>> > Actually, Ivan is pointing out that an active work project - called
>>>> PWP
>>>>
>>>>
>>>>
>>>>
>>>> Hi Leonard,
>>>>
>>>> And yes, good point, I completely mixed up the EUPB3 and PWP (Portable
>>>> Web Publication):
>>>>
>>>> http://www.w3.org/TR/pwp
>>>>
>>>> I've just read though the PWP Working Draft, and have some notes below.
>>>>
>>>> In summary, I think it's a good idea, but I'm not sure it really
>>>> focuses on the same problem (but please let me know if I've misunderstood).
>>>>
>>>> Craig
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Just to set the tone, people like to receive PDF's for documents (e.g.
>>>> sales reports) because they can be treated as an atomic document, that
>>>> isn't really editable (unlike an email), and can be saved for archivable
>>>> purposes (with no reliance on a website to be available to view it).
>>>>
>>>> Another example is someone who sees a webpage with some useful content,
>>>> and they want a copy of that content on their local computer (aka "Save Web
>>>> Page as"), so that they don't need to rely on an internet connection, for
>>>> the website to remain available (or being able to find the page again), or
>>>> the content on that page to change.
>>>>
>>>> Now there are defiantly some similarities to the problems we are trying
>>>> to address, with the main focus for me being the archive format:
>>>>
>>>> https://www.w3.org/TR/pwp/#package
>>>>
>>>> But this seems to be a very general spec, with options to have the
>>>> content unpackaged and delivered over the internet (rather than just a
>>>> single file):
>>>>
>>>> https://www.w3.org/TR/pwp/#state_definition
>>>>
>>>> In contrast, the spec seems to not really focus on being a file that
>>>> can be passed around/archived (e.g. emailing a PDF), but instead a central
>>>> resource which allows for copies of the document to be downloaded.
>>>>
>>>> https://www.w3.org/TR/pwp/#identification
>>>>
>>>> This is useful if you want to have a central location for a document,
>>>> and is kept up to date, but not so good if the primary purpose is really to
>>>> have a copy that is created at one point in time, where the person who
>>>> receives a copy will know that at it will stay as-is (read only).
>>>>
>>>> This setup seems to be confirmed in the security section:
>>>>
>>>> https://www.w3.org/TR/pwp/#security-models
>>>>
>>>> So if I was to send a report to a manager with sales figures, they will
>>>> want to open it on their mobile phone (a quick read before bedtime, I
>>>> assume), then later save it to their desktop computer so they can compare
>>>> it later to the next months report.
>>>>
>>>> So when the Working Draft mentions things like JavaScript Service
>>>> Workers:
>>>>
>>>> https://www.w3.org/TR/pwp/#arch
>>>>
>>>> And the concept of these documents having the ability to do things
>>>> (presumably allowing the content to change, perform tracking, etc), I don't
>>>> think it's fundamentally the right approach to this problem.
>>>>
>>>> But don't get me wrong, Portable Web Publications would be very good
>>>> for Publications... I just don't think many businesses use PDF attachments
>>>> in that way.
>>>>
>>>> :-)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> > On 18 Jan 2016, at 20:42, Leonard Rosenthol <lrosenth@adobe.com>
>>>> wrote:
>>>> >
>>>> > Actually, Ivan is pointing out that an active work project - called
>>>> PWP (Portable Web Publication - to address the need for having a better way
>>>> to publish content using web technologies both in a packaged and unpackaged
>>>> form.
>>>> >
>>>> > A solution that aligns with EPUB (but would not be EPUB 3.x as we
>>>> know it today) is certainly something being serious considered by various
>>>> folks as part of this work.
>>>> >
>>>> > Leonard
>>>> >
>>>> >
>>>> >
>>>> > On 1/18/16, 12:26 PM, "Craig Francis" <craig.francis@gmail.com>
>>>> wrote:
>>>> >
>>>> >> On 18 Jan 2016, at 17:13, Leonard Rosenthol <lrosenth@adobe.com>
>>>> wrote:
>>>> >>> So that a user browsing PDFs on the web doesn’t need anything extra.
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> I think Ivan is suggesting that EPUB3 might do the same.
>>>> >>
>>>> >> I'm still not 100% convinced how well it will work (as this does
>>>> depend heavily on the OS, and browsers).
>>>> >>
>>>> >> But in both cases (EPUB3, or using a ZIP to wrap up the HTML
>>>> document+assets) most of the building blocks are already in place.
>>>> >>
>>>> >> Craig
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>> On 18 Jan 2016, at 17:13, Leonard Rosenthol <lrosenth@adobe.com>
>>>> wrote:
>>>> >>>
>>>> >>> While a PDF file does need a “reader”, it should be pointed out
>>>> that EVERY MAJOR browser (Safari, Chrome, Edge, FireFox) all include PDF
>>>> viewing natively.  So that a user browsing PDFs on the web doesn’t need
>>>> anything extra.
>>>> >>>
>>>> >>> Leonard
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> On 1/18/16, 11:43 AM, "Craig Francis" <craig.francis@gmail.com>
>>>> wrote:
>>>> >>>
>>>> >>>> On 18 Jan 2016, at 16:13, Ivan Herman <ivan@w3.org> wrote:
>>>> >>>>
>>>> >>>>> Yeah. That will take time. On MacOS (starting from, I believe,
>>>> Mavericks) the system comes with an epub reader, so files of this kind are
>>>> automatically opened much like PDF files. Yes, it is an ebook reader on the
>>>> OS, but that is not much different than using a PDF reader.
>>>> >>>>>
>>>> >>>>> To be incorporated into browsers is a big step (and would be a
>>>> big step forward) which will need additional spec work. We are kept busy:-)
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> Good to know, and good point about PDF files needing a reader.
>>>> >>>>
>>>> >>>> If I could push the format in any way (more so how the software
>>>> works), I would like to be able to send a document that is opened, read,
>>>> and closed without it being imported into some kind of library.
>>>> >>>>
>>>> >>>> Maybe some ability for email clients to open the file for a "quick
>>>> look" (as per the OSX term), then optionally import.
>>>> >>>>
>>>> >>>> But I realise this is going away from the idea of using this
>>>> format primarily for books.
>>>> >>>>
>>>> >>>> Anyway, thanks for the heads up.
>>>> >>>>
>>>> >>>> Craig
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>>> On 18 Jan 2016, at 16:13, Ivan Herman <ivan@w3.org> wrote:
>>>> >>>>>
>>>> >>>>>>
>>>> >>>>>> On 18 Jan 2016, at 16:58, Craig Francis <craig.francis@gmail.com>
>>>> wrote:
>>>> >>>>>>
>>>> >>>>>> Hi Ivan,
>>>> >>>>>>
>>>> >>>>>> Just to follow up on this, I've been reading the spec at:
>>>> >>>>>>
>>>> >>>>>> http://www.idpf.org/epub/30/spec/epub30-overview.html
>>>> >>>>>>
>>>> >>>>>> And it does seem pretty much what I'm after.
>>>> >>>>>>
>>>> >>>>>> I'm not sure I like the extra meta files, but maybe they are
>>>> useful (e.g. the possibility of containing multiple HTML documents, one for
>>>> each language).
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>> For example. A book may also consists of many chapters each in
>>>> their individual files and the order is not clear. Etc.
>>>> >>>>>
>>>> >>>>>> So really the only remaining problem is getting email clients,
>>>> browsers, OS'es to be able to open these files quickly/easily... rather
>>>> than just automatically importing the file into an ebook reader.
>>>> >>>>>
>>>> >>>>> Yeah. That will take time. On MacOS (starting from, I believe,
>>>> Mavericks) the system comes with an epub reader, so files of this kind are
>>>> automatically opened much like PDF files. Yes, it is an ebook reader on the
>>>> OS, but that is not much different than using a PDF reader.
>>>> >>>>>
>>>> >>>>> To be incorporated into browsers is a big step (and would be a
>>>> big step forward) which will need additional spec work. We are kept busy:-)
>>>> >>>>>
>>>> >>>>> Cheers
>>>> >>>>>
>>>> >>>>> Ivan
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>>
>>>> >>>>>> Craig
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>> On 14 Jan 2016, at 11:17, Ivan Herman <ivan@w3.org> wrote:
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>> On 14 Jan 2016, at 12:05, Craig Francis <
>>>> craig@craigfrancis.co.uk> wrote:
>>>> >>>>>>>>
>>>> >>>>>>>> Thanks Ivan,
>>>> >>>>>>>>
>>>> >>>>>>>> You are right, I normally focus more on security side of
>>>> things.
>>>> >>>>>>>>
>>>> >>>>>>>> But out of interest, EPUB3, is that likely to get the same
>>>> integration as how PDFs work at the moment?
>>>> >>>>>>>>
>>>> >>>>>>>> As in, you can email someone an EPUB3 file, and the recipient
>>>> can click/tap on it to quickly view in their email client?
>>>> >>>>>>>>
>>>> >>>>>>>> Or simply have the web browser open it, rather than needing a
>>>> dedicated EPUB3 reader?
>>>> >>>>>>>
>>>> >>>>>>> In theory, all this is possible but the infrastructure is not
>>>> as widespread as for PDF. Eg, you need extensions for Firefox to open an
>>>> epub directly.
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>> So far I've really only considered EPUB as more of a format
>>>> for books (which is probably my lack of understanding of the format), so
>>>> I've never really thought of its use for reports, leaflets, etc (i.e.
>>>> things that PDF's tend to be used for).
>>>> >>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> EPUB is perfectly capable of handling that out of the box.
>>>> >>>>>>>
>>>> >>>>>>> Ivan
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>> In the mean time I'll have a read up on the PWP group.
>>>> >>>>>>>>
>>>> >>>>>>>> Craig
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>> On 14 Jan 2016, at 10:52, Ivan Herman <ivan@w3.org> wrote:
>>>> >>>>>>>>>
>>>> >>>>>>>>> Craig,
>>>> >>>>>>>>>
>>>> >>>>>>>>> thanks for your note. Two comments:
>>>> >>>>>>>>>
>>>> >>>>>>>>> - The format EPUB3, defined by IDPF, already does many of
>>>> what you say. On a very high level, it takes a (slightly constrained) Web
>>>> site and puts it into, essentially, a zip file. For many applications, this
>>>> is a worthy replacement for PDF. Note that almost all the electronic books
>>>> you buy today are in EPUB3 or its predecessor...
>>>> >>>>>>>>>
>>>> >>>>>>>>> - The DPUB IG also looks further down the line on a stronger
>>>> integration of digital publishing and the OWP:
>>>> >>>>>>>>>
>>>> >>>>>>>>> http://www.w3.org/TR/pwp
>>>> >>>>>>>>>
>>>> >>>>>>>>> which may lead to significant changes in the future.
>>>> >>>>>>>>>
>>>> >>>>>>>>> Bottom line: this evolution is already happening!
>>>> >>>>>>>>>
>>>> >>>>>>>>> I understand you come more from the security area; there may
>>>> be security issues with EPUB3 or PWP which we do not fully appreciate, so
>>>> any comment is welcome of course!
>>>> >>>>>>>>>
>>>> >>>>>>>>> Cheers
>>>> >>>>>>>>>
>>>> >>>>>>>>> Ivan
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>> On 14 Jan 2016, at 11:34, Craig Francis <
>>>> craig@craigfrancis.co.uk> wrote:
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Hi,
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Recently I've been thinking of some of the problems with
>>>> PDF's, which are useful for creating a document that can be archived,
>>>> emailed, printed, etc.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> HTML has solutions for many of PDF's problems though, for
>>>> example structured text (accessibility), ability to change layout depending
>>>> on screen size (no need for small screen devices to zoom into a fixed A4
>>>> layout), can change font size, better indexing support (searching for
>>>> documents), etc.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Unfortunately you can't just email a HTML document to
>>>> someone, as this causes a range of security problems, and including
>>>> resources can be difficult (you can inline them, or use MHTML, but these
>>>> are tricky to create).
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> So I was wondering if we could take the approach that
>>>> Microsoft Word did with the docx format, Java with JAR, PHP with PHAR,
>>>> etc...
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Have a new file format, associated with the browser, which
>>>> is just a ZIP/GZIP file that contains an index.html file, and everything
>>>> else needed for the document.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Then from a security point of view, it can be locked down to
>>>> its own little box, so no access to other files on the file system,
>>>> probably no access to cookies/localstorage, no ability to connect to
>>>> another host.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> And from the users point of view, the document could be
>>>> protected with a password (a feature that ZIP/GZIP provides already, and
>>>> the browser can prompt for when opening).
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> So would this help with the security aspects of emailing
>>>> HTML files to people (e.g. reports), and be better than PDFs?
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Craig
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> ---
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0063.html
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> https://code.google.com/p/chromium/issues/detail?id=575677
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> https://bugzilla.mozilla.org/show_bug.cgi?id=1237990
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> https://wpdev.uservoice.com/forums/257854-microsoft-edge-developer/suggestions/11443002-webpage-zip-as-alternative-to-pdf
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> ----
>>>> >>>>>>>>> Ivan Herman, W3C
>>>> >>>>>>>>> Digital Publishing Lead
>>>> >>>>>>>>> Home: http://www.w3.org/People/Ivan/
>>>> >>>>>>>>> mobile: +31-641044153
>>>> >>>>>>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>> ----
>>>> >>>>>>> Ivan Herman, W3C
>>>> >>>>>>> Digital Publishing Lead
>>>> >>>>>>> Home: http://www.w3.org/People/Ivan/
>>>> >>>>>>> mobile: +31-641044153
>>>> >>>>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> ----
>>>> >>>>> Ivan Herman, W3C
>>>> >>>>> Digital Publishing Lead
>>>> >>>>> Home: http://www.w3.org/People/Ivan/
>>>> >>>>> mobile: +31-641044153
>>>> >>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>> >>>>
>>>> >>>>
>>>> >>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> - Nick Ruffilo
>>> @NickRuffilo
>>> Aer.io <http://aer.io/> an *INGRAM* company
>>>
>>>
>>>
>>
>>
>> --
>> - Nick Ruffilo
>> @NickRuffilo
>> Aer.io <http://aer.io/> an *INGRAM* company
>>
>>
>>
>
>
> --
> - Nick Ruffilo
> @NickRuffilo
> Aer.io <http://aer.io/> an *INGRAM* company
>
>
>
>
>


-- 
- Nick Ruffilo
@NickRuffilo
Aer.io an *INGRAM* company
Received on Wednesday, 27 January 2016 15:18:41 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:36:22 UTC