Re: Proposal: PDF alternative using HTML (ZIP/GZIP)

Bill

Still downloading a single web page for offline use has no <nav> requirements. 

Interestingly the W3C uses javascript to dynamically create a TOC for its single page documents (http://w3c.github.io/dpub-pwp/) so if it’s good for the goose why not for the gander?

Perhaps eventually a browser will natively interpret a <nav> element, but if Javascript can make a contribution today why hold back from implementation? If one considers the BISG grid of supported/unsupported EPUB3 features, waiting for browser implementations will not be worthwhile. 

Also as Javascript is able to help generate <nav> today how is it more difficult than what is basically vaporware? The programming has to be done somewhere and if it can be done in Javascript today, let’s go for it…

I don’t see why a PWP shouldn’t directly reflect the techniques used in online pages. PWPs should be as freewheeling as EPUBs are restrictive.

Mike

> On 01 Feb 2016, at 10:17, Bill McCoy <bmccoy@idpf.org> wrote:
> 
> Hi Mike, well my vision is that the content itself should be declarative and interoperably so (statically, not requiring programmatic execution of any contained JS code within a browser engine context). Having a <nav> (potentially) constructed by JS that's shipped with a publication is not going to get us to something that assistive technology or any other semantic-processing workflow can reliably detect and appropriately handle. For example one use case is to "explode" a publication into parts, such as chapters. That use case would be way more difficult to handle if the navigational structure (aka TOC) is encoded only in programmatic JavaScript.
> 
> Shipping along programmatic JavaScript that does something sensible with that <nav> in the absence of any native implementation (a polyfill, in other words) does make a lot of sense to me.  I think you are unnecessarily conflating packaging though - whether a particular chunk of content is packaged or not is, to me, orthogonal to the question of whether that content has interoperably determinable logical structure. Today's EPUB happens to define structure only in the context of packaged content, but that's overdue to be disentangled, perhaps as soon as EPUB 3.1.
> 
> --Bill
> 
> 
> On Mon, Feb 1, 2016 at 1:03 PM, Mike Perlman <perlmanm@me.com <mailto:perlmanm@me.com>> wrote:
> Hi Bill
> 
> We are probably along way away from <nav> in a PWP given that there isn’t even anything like .epub that will decompress a directory and open in a browser window without the addition of external code.
> 
> For single page web->PWP (one of the visions of a PWP) <nav> is not needed.
> 
> For multipage 5DOC demos, I use fullpage.js which defines sections of a publication with <section> and individual pages (aka chapters) with <slide>. The <nav> component sits inside the body before the first <section>.
> As 5DOC is delivered in one file - and includes inside the file CSS, Javascript, media, SVG and fonts - a <nav> is not required. Obviously pages are sequenced and fullpage.js creates a <nav> on the fly.
> 
> My point is why make <nav> a requirement? It could be part of best practices for multipage/space docs. There could be some new features/functions to make creating a <nav> easier. Realistically a <nav> will be created by Javascript as it is already. We are there now. 
> 
> I think it would be very helpful for use cases to be accompanied by actual samples. PWP needs a few “flight simulators” to help folks experience it.
> 
> Cheers
> Mike
> 
>> On 01 Feb 2016, at 07:33, Bill McCoy <bmccoy@idpf.org <mailto:bmccoy@idpf.org>> wrote:
>> 
>> Thank you Craig, seems we are generally in agreement.
>> 
>> Two fine points I wanted to mention before dropping this as well, one in reply to your comments below and one in reply to Deborah:
>> 
>> - the "save Web page as" use case is to me a symptom of where I am slightly in disagreement with this group's consensus definition of "publication" as essentially any "aggregated set of Web Resources" [1]. To me the PWP definition is way too broad for the term "publication" or "electronic document" because it encompasses totally interactive content that doesn't have any structure to reliably navigate, as well as arbitrary collections of data items. Such things may be "published" on the Web or otherwise, but I don't consider them "publications" or "documents" (i.e. the verb form is applicable but not the noun form). To me by definition a "publication"/"document" has internal structure - usually linear (a beginning, middle, and end) - but in some case it could be a tree structure where any linearity that spans the whole tree is secondary (a cookbook or dictionary). So to me EPUB's requirement for reliable navigation via a specially designated <nav> element which spans the constituent content in reading order is not only sensible, it is a sine qua non: an aggregated set of Web Resources is a "publication" if & only if such reliable navigation makes sense for that content. But I accept that this group has chosen a broader definition, albeit I think it may be useful to try to at some point tease apart the minimal definition of "publication" as per EPUB (which I think is pretty close to mine) vs. the broader definition that could encompass an interactive application or dataset as well as what we would think of as a "document". I remain unconvinced that we can usefully define anything narrower than "all of OWP" for the broader cases but while I quibble with the terminology I don't object at all to the ambitious goal.... I admit that in the new digital-native world the concept of "publication" probably needs to get broader!
>> 
>> - re: accessibility, Deborah mentioned some things as did you. Prior discussions in this group helped convince me that EPUB has gotten things a bit wrong by going half way to requiring accessibility as a condition of validity. While in EPUB 3.1 we are presently headed even further in this direction I now believe that for proper architectural layering things like requiring structural semantics beyond what HTML5 mandates belongs in a separate accessibility profile not on the base specs (whether for PWP or EPUB), with WCAG 2.0 already being a key building block of this type of modular approach. So even though accessibility is critically important, it would be impractical as well as inelegant for the base specifications of OWP (including in the future EPUB/PWP) to attempt to require it or even to fully define it. Among other benefits of the modular approach is that a separate a11y profile can be much more rigorous regarding MUSTs than would be possible to gain consensus on for base specifications. That is however just a personal opinion and IDPF is still discussing together with other accessibility stakeholders such as DAISY and BISG that are engaged with us in enhancing EPUB what is the best path forward. I mention it only to note that it's one area where discussions in this group are already having a practical impact.
>> 
>> --Bill
>> 
>> [1] https://www.w3.org/TR/2015/WD-pwp-20151126/#pwp_definition <https://www.w3.org/TR/2015/WD-pwp-20151126/#pwp_definition>
>> 
>> 
>> On Sun, Jan 31, 2016 at 10:45 AM, Craig Francis <craig.francis@gmail.com <mailto:craig.francis@gmail.com>> wrote:
>> Hi Bill,
>> 
>> I'm going to let my suggestion die, this is just some comments regarding EPUB.
>> 
>> 
>> <snip>
>> if EPUB/PWP succeeds as the next-generation portable document format for OWP then it would be logical for browsers to similarly implement out-of-the-box.
>> </snip>
>> 
>> 
>> If we do go down this route, where browsers provide support for EPUB/PWP, then I agree that it will be easier to implement than PDF.
>> 
>> While I am very impressed in what PDF.js has managed to do, as you pointed out, it is very complicated.
>> 
>> But I would suggest that you give it some thought on how these files would be presented in the browser.
>> 
>> A user opening a document probably needs to have in mind what the view represents.
>> 
>> At the moment I worry that PWP might be seen as the main delivery method of EPUB, and the way it works (magic updating) would undermine EPUB being used for anything that cannot/shouldn't be updated.
>> 
>> At the moment I can show a PDF on a website, and users kind of understand that it is a PDF file, and what that means (e.g. they can download it to their computer, and it will remain there un-changed, always available, and does not perform network requests when it is opened).
>> 
>> 
>> 
>> <snip>
>> Of course there's a thousand ways to implement navigation/menus in a web page, the EPUB requirement that this be via a designated <nav> element simply makes for more interoperability.
>> </snip>
>> 
>> 
>> I'm really glad to see that assistive devices are being considered like this.
>> 
>> And when it comes to multi-page documents, I really do see the point in enforcing a navigation bar to be present, and for it to follow a very strict structure (says he who has created some of those nasty HTML navigation bars before now).
>> 
>> So personally, I want to see this requirement remain in place for EPUB files, as it forces people to build large documents correctly.
>> 
>> But there are times, like with a single page leaflet (with only a few words), or when you are just packaging up an existing webpage (e.g. "Save Web Page As" for later reading), where a navigation bar won't be appropriate.
>> 
>> And in terms of use cases (following on from another thread), a fairly common one I see is the "simple web page".
>> 
>> For example, some of the designers and developers I work with will create the basic website design in PhotoShop (yes, I know, not ideal), and then save it out as a PDF for the client to approve. Ideally those designers will learn basic HTML/CSS, working with the medium (media queries please), and package that up to send to the client before it's built "properly".
>> 
>> 
>> <snip>
>> That being said I do think that one major difference with documents/publications vs. simple Web pages is that hand authoring is much less of a consideration.
>> </snip>
>> 
>> 
>> Very true, I have really just been thinking of the cases where the document is being hand created (keeping in mind that I am a programer that frequently has to create the type of documents I've been focusing on).
>> 
>> 
>> 
>> <snip>
>> In that regard your point about a "Java program" also seemed misleading. There is a normative validator for EPUB (validator.idpf.org <http://validator.idpf.org/>) as there is for HTML5 (validator.w3.org <http://validator.w3.org/>). There is no requirement by either standard that content be validated before being used.
>> </snip>
>> 
>> 
>> I only mentioned the Java program as it looked like it was required to package up the EPUB file.
>> 
>> I didn't really look at the source code, I just simply wanted to take my source files (folder) and compile it into an EPUB file.
>> 
>> If I used zip or gzip on the folder, it didn't seem to work (likewise doing the reverse on an existing EPUB file).
>> 
>> So without doing much more research, I suspect it is something custom (and if it does require Java, I won't be able to run that on my server, because I cannot install Java).
>> 
>> 
>> Craig
>> 
>> 
>> 
>> 
>> 
>>> On 29 Jan 2016, at 17:50, Bill McCoy <bmccoy@idpf.org <mailto:bmccoy@idpf.org>> wrote:
>>> 
>>> Hi Craig, I will just try to reply to a couple of your points:
>>> 
>>> <snip>
>>> Now when you say that browsers will natively support EPUB/PWP, I'm not sure that they will. 
>>> <snip>
>>> 
>>> Of course my statement was aspirational. But given that all modern browsers / operating systems support PDF (either natively as per Chrome or via OS-bundled "helper apps" as per Safari with Preview) then if EPUB/PWP succeeds as the next-generation portable document format for OWP then it would be logical for browsers to similarly implement out-of-the-box. Which of course would be much much lighter-weight than PDF that requires either a totally separate app or a truly monstrous polyfill (in the case of PDF.js). That PDF.js was never workable in practice on devices with FirefoxOS is one datapoint about the gulf between PDF and OWP.
>>> 
>>> <snip>
>>> But not all documents fit this approach (i.e. a single page invoice, with no need of a Table of Contents).
>>> <snip>
>>> 
>>> A one page document (fixed or reflowable) may still have internal subsections so the <nav> element required by EPUB 3 is thus still useful to interoperably communicate this structure to for example assistive technology (without requiring the assistive agent to parse the entire HTML content document). In the case where there is really no internal structure at all then even a stub <nav> element with only the single entry point can still help because it can link to the beginning of content, avoiding any header stuff. Of course there's a thousand ways to implement navigation/menus in a web page, the EPUB requirement that this be via a designated <nav> element simply makes for more interoperability. I don't see this as a huge burden given that this is precisely what <nav> was added to HTML5 to support (pre-HTML5 navigation was, and still usually is, cobbled together through a variety of divs/spans/buttons/tables/JS, which is challenging for assistive technology - and other semantic processing - to "decode". 
>>> 
>>> It is arguable that in the very simplest case where the content starts from the start of the <body> EPUB 3 should not have required the <nav> element. That's a corner case, when we are discussing publications rather than just web pages but such simpler defaults could be part of a PWP/EPUB4. But this brings me to the last point in your email that I'd like to respond to..
>>> 
>>> <snip>
>>> It was also annoying with the amount of files I needed to include, and keep up to date
>>> <snip>
>>> 
>>> I agree and again I think a more fully Web-aligned PWP/EPUB4 can do better (and IDPF is already taking steps in this direction with EPUB 3.1).
>>> 
>>> That being said I do think that one major difference with documents/publications vs. simple Web pages is that hand authoring is much less of a consideration. The vast majority of documents and publications are generated by authoring tools and content generation systems, not humans. PDF is incredibly widely proliferated yet it's no easy thing to hand-author a valid PDF file (even for folks who wrote parts of the PDF specification). SVG is somewhat more hand-authoring friendly than PDF but in reality it's not going to be common for humans to hand-author vector graphics based on graphics states and affine transforms. Sacrificing features that improve interoperability, accessibility, and semantic proccessability in exchange for making it easier to hand author doesn't seem like a good tradeoff to me. Arguably this is true for OWP overall, now that most HTML pages are generated by tools like Wordpress and Drupal, even the templates for these sites are often tool-generated. But if we are specifically talking about an alternative to PDFs, which are 100% tool-generates, I don't think hand-authoring is what is going to matter.
>>> 
>>> So basically that you (and others!) may find hand-authoring EPUB "annoying" to me is just that - an annoyance not a fatal flaw. It is at least possible to do (unlike PDF). We can and should make some efforts to improve this (among other things easier hand-authoring may in some respects correlate to simpler machine-generation) but I don't see it as in any way one of the highest priorities even for a future PWP/EPUB4.
>>> 
>>> In that regard your point about a "Java program" also seemed misleading. There is a normative validator for EPUB (validator.idpf.org <http://validator.idpf.org/>) as there is for HTML5 (validator.w3.org <http://validator.w3.org/>). There is no requirement by either standard that content be validated before being used. That in the EPUB ecosystem a number of commercial distribution channels require content to validate is arguably a feature not a bug but it's nothing to do with EPUB as a format.  That PDF doesn't have any normative validation software is definitely (IMO anyway) a bug not a feature. That stems from it being originally a proprietary format (the first versions of the PDF manual made that very clear by opening with "PDF is the file format of Adobe Acrobat") and because PDF's not a modular format built on other standards like ZIP, HTML, and XML so there's no building-block concepts of validity to layer upon.
>>> 
>>> --Bill
>>> 
>>> 
>>>  
>>> 
>>> On Thu, Jan 28, 2016 at 2:23 PM, Craig Francis <craig.francis@gmail.com <mailto:craig.francis@gmail.com>> wrote:
>>> Hi Bill,
>>> 
>>> Thank you for your reply.
>>> 
>>> I agree that PDF's do a very good job at fixed layout (high visual fidelity, ensuring a consistent visual representation). This is very useful when sending documents to printers.
>>> 
>>> And I agree that PDF's can include some very good accessibility features, but as you mentioned, this is rarely implemented.
>>> 
>>> I also agree that EPUB is an interesting standard (I've just been making a "hello world" document, which I'll talk about in a bit),   and it does a very good job of packaging up a publication (with a focus on books).
>>> 
>>> And I'm glad that PWP is taking that even further on the Open Web Platform (using Open Web Technology), with more of a focus on "unpackaged content" (from my understanding, this allows the updating and referencing of documents from a central URL).
>>> 
>>> I also understand that accessibility can be hard to pin down in a document/specification, but I find that HTML does a very good job of providing accessibility by default anyway (it just goes wrong when developers do weird things).
>>> 
>>> Now when you say that browsers will natively support EPUB/PWP, I'm not sure that they will. They might continue to downloaded these files and hand them over to program that is dedicated to this task (for example an eBook reader)... the same is true if you email an EPUB file to someone.
>>> 
>>> And I'm glad to see that EPUB does help publishers, so they don't need to implement pagination in every document they create.
>>> 
>>> But not all documents fit this approach (i.e. a single page invoice, with no need of a Table of Contents).
>>> 
>>> And it's because of this that I disagree that EPUB matches a "PDF alternative using HTML (ZIP/GZIP)".
>>> 
>>> -----
>>> 
>>> Ok, so my experimentation with EPUB...
>>> 
>>> I've just created a simple Hello World document:
>>> 
>>> https://github.com/craigfrancis/wdoc/tree/master/alternatives/epub <https://github.com/craigfrancis/hdoc/tree/master/alternatives/epub>
>>> 
>>> I had to do this by using an example document:
>>> 
>>> https://github.com/IDPF/epub3-samples/tree/master/30/accessible_epub_3 <https://github.com/IDPF/epub3-samples/tree/master/30/accessible_epub_3>
>>> 
>>> As the specification was a little difficult to read (as in, there is a lot of it):
>>> 
>>> http://www.idpf.org/epub/30/spec/ <http://www.idpf.org/epub/30/spec/>
>>> http://www.idpf.org/epub/30/spec/epub30-publications.html <http://www.idpf.org/epub/30/spec/epub30-publications.html>
>>> http://www.idpf.org/epub/30/spec/epub30-contentdocs.html <http://www.idpf.org/epub/30/spec/epub30-contentdocs.html>
>>> 
>>> I also found that I needed to run a Java program to do the checking and packaging of the file:
>>> 
>>> https://github.com/IDPF/epub3-samples <https://github.com/IDPF/epub3-samples>
>>> 
>>> https://github.com/IDPF/epub3-samples/tree/master/lib/epubcheck-3.0-RC-2 <https://github.com/IDPF/epub3-samples/tree/master/lib/epubcheck-3.0-RC-2>
>>> 
>>> Which I thought was a bit excessive for the document types I have in mind.
>>> 
>>> It was also annoying with the amount of files I needed to include, and keep up to date (mimetype, container.xml, package.opf).
>>> 
>>> And at the moment (so may change in the future) OSX is configured to just download, open, and import these files into iBooks, which isn't always appropriate... with a similar setup on other platforms.
>>> 
>>> -----
>>> 
>>> In comparison, I don't think we need to create much of a spec for what I'm proposing...
>>> 
>>> As a developer, I can just create a folder with an "index.html" file, include any extra resources, and package everything up with:
>>> 
>>>  zip -r ./my-file.wdoc ./my-file/
>>> 
>>> I can than put that file on a website, or send via email, use a USB drive, etc.
>>> 
>>> Web browsers can then be updated to recognise that file extension, and apply some simple sandboxing rules... that is all :-)
>>> 
>>> I've just uploaded a summary document for what I'm trying to propose here:
>>> 
>>> https://github.com/craigfrancis/wdoc <https://github.com/craigfrancis/hdoc>
>>> 
>>> Craig
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On 27 Jan 2016, at 19:44, Bill McCoy <bmccoy@idpf.org <mailto:bmccoy@idpf.org>> wrote:
>>>> 
>>>> Hi, this has been an interesting thread, sorry to chime in late but I wanted to make a couple of points:
>>>> 
>>>> - With regards to "legal" documents, I think some of the discussion is conflating "self-containedness" (/reliability) with visual fidelity. In an N-screen world, with visual representation not the only way to communicate information, the idea that a particular view of content is the *only* way to represent normative information such as contracts, forms, etc. is pretty archaic. Of course it's good to have the option to create "WYSIWYG" documents and the Web does... fixed-layout EPUB 3 documents can use CSS positioning, SVG, or bitmap images. And EPUB already has a multiple renditions specification [1] so that you can combine fixed and reflowable representations into a composite publication, including the means to map between them.
>>>> 
>>>> - The core architecture of PDF is content "typeset at the factory" to ensure a consistent visual representation. PDF evolved from PostScript  so at core a PDF is a sequence of page images containing scalable vector graphics, images, and precisely positioned glyphs. The Web analog is a PDF file = an ordered sequence of SVG images (SVG having started life as an XML mapping of the PDF spec). This makes PDF inherently not mobile-ready (in terms of adjustment of content to different sized screens), not very accessible, and not very semantically intelligible in various machine-processing workflows. Computers can drive cars so clearly they can reconstruct text and structure from visual information, but it's a heuristic process. As Leonard indicates  it's possible in theory to create accessible PDFs but since the logical structure features were grafted onto PDF's sequence-of-page-images architecture years after the fact the result is pretty awkward which is one reason that most PDF creation tools (including many from Adobe) don't even attempt it at all, much less to the level needed to meet WCAG 2.0 standards (it is nearly impossible to fund PDF content that is actually conformant to the PDF/UA profile ). As well the W3C WCAG guidelines were designed to work with Web Standards based content, so don't map so well to PDF. After all if a sequence of SVG images was good enough we could scrap HTML5 and just use SVG everywhere.
>>>> 
>>>> To me it's pretty clear that the evolutionary vector of EPUB/PWP to make portable web publications truly first-class in the Open Web Platform will need to encompass both reliable and packaged content (as per today's EPUB) as well as unpackaged content (which PWP is exploring).  By no means should we consider the PDF is better for legal documents or really any documents than a truly Web-based solution. Of course that doesn't mean PDF will go away but logically EPUB 3 already delivers (with reflowable and fixed-layout content) a superset of the expressional capabilities of EPUB and with PWP work we will take things even further and make things even more Web-native. 
>>>> 
>>>> But I do think we need to tease apart the key attributes and not conflate "reliable" with "packaged" with "fixed-layout". Portable Web Publications need to support all of these attributes even though individual instances may choose which ones they fully deliver on. I would, with hesitation, even add "accessible" to this list of separable attributes. I would like all content in the next-generation portable document format to be accessible, but as a broad-based part of OWP it's not clear that this is realistic to set as  a baseline requirement (hence one thing IPDF is considering in conjunction with our EPUB 3.1 revision is separating accessibility requirements into a layered profile, separate from the base specification).
>>>> 
>>>> Regarding Nick's comment about whether we need "something that works immediately". Ultimately browsers and operating systems will natively support EPUB/PWP, as they already do PDF, so all that content will certainly work "immediately", but I take the comment as whether a "polyfill" is necessary. To me the "polyfilling"/"prolyfilling" content, especially when that content is deployed live to the Web, must be a choice the content publisher can make. It has to be an option as we can't assume specialized user agents will always exist. But such programmatic polyfills must be optional not mandatory. Cleanly separating content representation from implementation mechanism is critical to ensure semantics aren't lost in implementation, If we failed in that we would be worse than PDF, we would be back to PostScript in which the content was only the side effect of interpreting programs. 
>>>> 
>>>> So by way of example, it must be possible to represent content that is designed to be dynamically paginated without having to ship with it a JS implementation to do that pagination, especially since in a number of use cases whatever dynamic pagination that comes along with the content may not be desired. But if content served up online wants to deliver a default pagination implementation, it should be able to do so and have some confidence that it will be utilized where it makes sense and not where it doesn't. Not always an easy problem but the issue of when a polyfill associated with content is used vs. an external native implementation is not unique to publications. What we have to avoid is tying the content itself to particular code expressions. It's great to design something that will work really well with Service Workers. Not so good to make something that requires JS code based on Service Workers, particularly for structured content that needs to be machine-processed in a variety of ways.
>>>> 
>>>> And most of all I don't think we should even consider forking yet another effort on something different. We already have a "PDF alternative using HTML (ZIP/GZIP)", it's called EPUB, it's already widely utilized and is expanding into new segments of content publishing, and with the PWP work we're hopefully going to take that alternative much further towards full convergence with OWP.
>>>> 
>>>> --Bill
>>>> 
>>>>  
>>>> [1] http://www.idpf.org/epub/renditions/multiple/ <http://www.idpf.org/epub/renditions/multiple/>
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Tue, Jan 26, 2016 at 1:51 PM, Leonard Rosenthol <lrosenth@adobe.com <mailto:lrosenth@adobe.com>> wrote:
>>>> >I do feel that there is a need for a document format, as per my understanding of PWP, that has the ability to be updated (e.g. for publications).
>>>> >But that is different to files that need to remain as atomic units, that remain isolated from everything else.
>>>> >
>>>> There is no requirement that a PWP needs to be updatable – that’s just one use case where it could.  At the same time, there are also clear use cases (such as your own) where the document/publication is “atomic” or “unique” and would never be modified.   And these criteria are also separate from others such as self-containment.
>>>> 
>>>> Thanks for the info below – but I don’t see any advantage for HTML-based publications in those workflows.  You wouldn’t be leveraging anything specific to the Open Web Platform and its ecosystem.  PDF seems like a much better alternative.  (NOTE: a PDF can be 100% identically accessible to HTML – it just happens that authoring accessible HTML is easier than accessible PDF, but that’s a tool issue not a format issue)
>>>> 
>>>> Leonard
>>>> 
>>>> From: Craig Francis <craig.francis@gmail.com <mailto:craig.francis@gmail.com>>
>>>> Date: Tuesday, January 26, 2016 at 9:53 AM
>>>> To: Leonard Rosenthol <lrosenth@adobe.com <mailto:lrosenth@adobe.com>>
>>>> Cc: W3C Digital Publishing IG <public-digipub-ig@w3.org <mailto:public-digipub-ig@w3.org>>, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>>, Nick Ruffilo <nickruffilo@gmail.com <mailto:nickruffilo@gmail.com>>
>>>> 
>>>> Subject: Re: Proposal: PDF alternative using HTML (ZIP/GZIP)
>>>> 
>>>> On 26 Jan 2016, at 12:47, Leonard Rosenthol <lrosenth@adobe.com <mailto:lrosenth@adobe.com>> wrote:
>>>> 
>>>>> PWP is designed to cover all of those use cases, as there are many uses for publishing content – as seen in the myriad of industries that have adopted PDF.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Hi Leonard,
>>>> 
>>>> You are probably right, and I'm just thinking about it from a programmers point of view (one who has to send reports).
>>>> 
>>>> I do feel that there is a need for a document format, as per my understanding of PWP, that has the ability to be updated (e.g. for publications).
>>>> 
>>>> But that is different to files that need to remain as atomic units, that remain isolated from everything else.
>>>> 
>>>> We also need to think how these files are consumed. For example, if I send you an ePub file today, you will probably want to open and save it in an e-reader with other books. Whereas if the email contained a PDF file, it would be opened/read, but ultimately closed and not saved (where the email can be archived if it needs to be read again later).
>>>> 
>>>> I might be going into too many specifics, but I have a few examples below if you're interested.
>>>> 
>>>> Craig
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> I work for a company that assess students with disabilities who are going to university.
>>>> 
>>>> In the UK we have a couple of organisations, such as Student Finance England (SFE), who provide funding to those students, so they can the get the equipment or support they need.
>>>> 
>>>> So the company I work for meet and do assessments for each student, get quotes from suppliers, and make recommendations as to what each student should have (e.g. a laptop, and note taking lessons).
>>>> 
>>>> The report the assessor writes is currently sent to SFE as a PDF file, which introduces a few accessibility issues.
>>>> 
>>>> Ideally I would instead create a HTML file, package that into a ZIP (to include some extra resources), and send it to SFE.
>>>> 
>>>> But they will not open a HTML file due to the security implications (nor would any student who we send it to, assuming they know that the HTML file attachment can be opened in a web browser).
>>>> 
>>>> Then, because SFE are so worried about the students private information, they actually use PGP (the zip kind) and I believe they open the PDF report on a computer that has extremely limited access to the internet (as in, can only send and receive email).
>>>> 
>>>> So when PWP does becomes available, I doubt they will accept them, especially if they know that the report could be updated/changed in any way.
>>>> 
>>>> SFE then send out a DSA2 file (which authorises the supplier to dispatch the items), and the supplier in turn raises an invoice for SFE to pay... neither of these (currently PDF) documents can be editable from a technical or legal point of view.
>>>> 
>>>> Another example is the Terms and Conditions we send to the student. While this is a "living document" that is changed over time, the copy the student receives must remain the same for them.
>>>> 
>>>> Or when we send some statistics to SFE for the number/type of assessments that were completed, even if we later find out that the type of one assessment was wrong, and is technically incorrect, that file still needs to record what was sent (plus a follow up report to show the corrected statistics).
>>>> 
>>>> Then, with a couple of my other clients, there are still contracts that need to be signed, or invoices that are issued.
>>>> 
>>>> All of these better fit the HTML+ ZIP proposal, which needs a very strict sandbox.
>>>> 
>>>> Whereas with PWP that better suits:
>>>> 
>>>> - A writer publishing a fictional story, which might contain typos to be corrected.
>>>> 
>>>> - A newspaper which includes corrections, as more information is discovered.
>>>> 
>>>> - An academic writing a paper, where the document can referred to by others by a URL.
>>>> 
>>>> - An educational book that needs to be kept up to date with the latest information, and distributed from a central server.
>>>> 
>>>> And as Nick has just pointed out, maybe these documents could have their own cookie store / local storage, allowing the document to record your notes and answers.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 26 Jan 2016, at 12:47, Leonard Rosenthol <lrosenth@adobe.com <mailto:lrosenth@adobe.com>> wrote:
>>>>> 
>>>>> PWP is designed to cover all of those use cases, as there are many uses for publishing content – as seen in the myriad of industries that have adopted PDF.
>>>>> 
>>>>> Leonard
>>>>> 
>>>>> From: Craig Francis <craig.francis@gmail.com <mailto:craig.francis@gmail.com>>
>>>>> Date: Tuesday, January 26, 2016 at 7:42 AM
>>>>> To: Leonard Rosenthol <lrosenth@adobe.com <mailto:lrosenth@adobe.com>>
>>>>> Subject: Re: Proposal: PDF alternative using HTML (ZIP/GZIP)
>>>>> 
>>>>> Thanks for the clarification Leonard,
>>>>> 
>>>>> I can certainly see the use cases for JavaScript, and glad to see you are considering them.
>>>>> 
>>>>> Personally I would like to suggest not relying on warnings to the user (as they don't really understand what they mean), but I like that you are also considering restricting the JavaScript.
>>>>> 
>>>>> 
>>>>> 
>>>>> Otherwise I think the proposed HTML+ZIP and PWP documents are similar (e.g. using HTML+CSS), but do have slight differences:
>>>>> 
>>>>> PWP: Documents are kept up to date, where (temporary) offline copies can be made.
>>>>> 
>>>>> PWP: Published from a central location, so references to it can be made (like saying book X from author Y).
>>>>> 
>>>>> HTMl+ZIP: Copies of the document can be created, but once those copies are made, they remain as their own entity (typically for archival purposes).
>>>>> 
>>>>> HTML+ZIP: Seen as read-only content (in as much as any computer document is read-only), representing a document or data at that point in time.
>>>>> 
>>>>> Craig
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 22 Jan 2016, at 19:42, Leonard Rosenthol <lrosenth@adobe.com <mailto:lrosenth@adobe.com>> wrote:
>>>>>> 
>>>>>> Nick – you should be careful to separate the file format from the reader.  You do it well for PWP and RS, but forgot for PDF.
>>>>>> 
>>>>>> Yes, a PDF file can contain JavaScript which are documented (according to the spec) to run at specific times during the load and viewing of a PDF.  This is exactly like what JS can do with HTML, which is then what would happen when packaged in a PWP.   Certain subsets of PDF restrict the presence of scripts entirely or in limited uses – just as EPUB currently does as an example of a PWP.
>>>>>> 
>>>>>> However, there are ZERO requirements (or even recommendations) in the PDF standard about a “conforming reader” (the PDF term for a Reading System/RS) providing any type of warnings about the presence (or lack thereof) for JavaScript.    So any such UI that might exist in your PDF conforming reader of choice is that application’s decision.  Other conforming readers can/do things differently vis-a-vis JavaScript – including some (such as Apple’s Preview) that completely ignore it.
>>>>>> 
>>>>>> As for JS in PWP – I think it’s much too early to make any specific statements about that. We know that some forms of PWP (such as EPUB x.x) might choose to restrict the JS, just as it does today – but that’s a specific case not the general one.   Same with sandboxing, I don’t see that as a PWP requirement but might well exist for certain specific cases and implementations.
>>>>>> 
>>>>>> Leonard
>>>>>> 
>>>>>> From: Nick Ruffilo <nickruffilo@gmail.com <mailto:nickruffilo@gmail.com>>
>>>>>> Date: Friday, January 22, 2016 at 12:58 PM
>>>>>> To: Craig Francis <craig.francis@gmail.com <mailto:craig.francis@gmail.com>>
>>>>>> Cc: Leonard Rosenthol <lrosenth@adobe.com <mailto:lrosenth@adobe.com>>, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>>, W3C Digital Publishing IG <public-digipub-ig@w3.org <mailto:public-digipub-ig@w3.org>>
>>>>>> Subject: Re: Proposal: PDF alternative using HTML (ZIP/GZIP)
>>>>>> 
>>>>>> Craig,
>>>>>> 
>>>>>> Lets nail down exactly why the PWP wouldn't work for that situation.  Currently PDF does allow you some "scripting" but before it runs, the user is prompted: "this PDF has scripting, do you wish to turn it on"  Would something like that (the choice of the reading system) suffice?
>>>>>> 
>>>>>> Additionally, it is my understanding that the HTML and Javascript would be in a sandbox environment, and have limited access (if any) to manipulate external files.  It would be the reading system's responsibility to feed any data that the PWP would require externally.  So the security issues then lay outside of the PWP itself, and more in the reading system - something that PWP could possibly address as a note to implementors...
>>>>>> 
>>>>>> As a note - pretty much any MS Office file can have scripting in it, and can actually manipulate files on the filesystem (there are viruses written in word and excel).  Because of this, Microsoft warns you before you run a script in these formats.  This hasn't stopped business in any way (or IT) from trusting the storage and download of such files.
>>>>>> 
>>>>>> My understanding is that even though the contents are HTML - this is not to be thought of as the "open web" but a package format that uses all of the open web technology.  
>>>>>> 
>>>>>> -Nick
>>>>>> 
>>>>>> On Fri, Jan 22, 2016 at 12:12 PM, Craig Francis <craig.francis@gmail.com <mailto:craig.francis@gmail.com>> wrote:
>>>>>>> Hi Nick,
>>>>>>> 
>>>>>>> Yes, I certainly like the ideas behind PWP, and I'm glad to see this is happening.
>>>>>>> 
>>>>>>> I just don't think it works for the original proposal, which is an alternative to PDF's, having all the benefits of HTML, but still remaining read-only files that can be emailed, and IT Departments can trust being on their computers (ref the security restrictions that can applied).
>>>>>>> 
>>>>>>> Craig
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On 21 Jan 2016, at 14:16, Nick Ruffilo <nickruffilo@gmail.com <mailto:nickruffilo@gmail.com>> wrote:
>>>>>>>> 
>>>>>>>> Craig,
>>>>>>>> 
>>>>>>>> To your point of PWP being a format that has an interaction with a server - I don't disagree, but I think that's only 1 of the two main use cases for PWP.  One of those cases is to be able to be a quality container for ebooks.  Ebooks are expected to be read in an offline mode on devices that may not have any connectivity to the internet.  In these cases, online is simply not an option - therefore the PWP must work in a 100% offline mode.  The content creator ultimately has the choice to build their PWP the way they see fit.
>>>>>>>> 
>>>>>>>> I imagine a significant majority of PWPs created will be "offline" assuming that popular word processors adopt it as a format.  Mainly because of the business case you brought up - an employee generating an offline-mode file for sharing and archival purposes.  But, there will be many use cases where an updateable, benefiting-from-access-to-the-internet document format is superior.
>>>>>>>> 
>>>>>>>> -Nick
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, Jan 21, 2016 at 7:02 AM, Craig Francis <craig.francis@gmail.com <mailto:craig.francis@gmail.com>> wrote:
>>>>>>>>> Hi Nick,
>>>>>>>>> 
>>>>>>>>> I'm glad to see that you're not trying to dilute PWP with too many use cases.
>>>>>>>>> 
>>>>>>>>> With your comment about exporting it as a HTML file, and emailing that, this is where the problems currently lie, and why I'm making this proposal.
>>>>>>>>> 
>>>>>>>>> I'm not sure which mailing lists you are subscribed to, but in summary, a HTML file on its own is a big security problem, and it's difficult to include resources (in terms of development time/tooling)... for more info, please see:
>>>>>>>>> 
>>>>>>>>> https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0090.html <https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0090.html>
>>>>>>>>> 
>>>>>>>>> https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0089.html <https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0089.html>
>>>>>>>>> 
>>>>>>>>> In regards to PWP, I feel that it is a good idea, and defiantly has its use cases.
>>>>>>>>> 
>>>>>>>>> But I suspect that file format PWP becomes to be known as, will be seen as something that has an interaction with a server, and allows for the document to be updated.
>>>>>>>>> 
>>>>>>>>> That defiantly has its uses, but as with PDF's, there are cases where it's good to know that the file sent cannot change, or communicate with an external server for any reason (instead its seen as being locked down, in a read only state, via a sand box that the browser provides).
>>>>>>>>> 
>>>>>>>>> So where you see PWP being a more versatile format than PDF, that is good, but I believe we also need a second branch which takes some of the strengths of PDF, and uses existing technology to fix some of its problems (which I hope my previous emails explain, but I am happy to discuss if not).
>>>>>>>>> 
>>>>>>>>> Craig
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 19 Jan 2016, at 14:39, Nick Ruffilo <nickruffilo@gmail.com <mailto:nickruffilo@gmail.com>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Craig,
>>>>>>>>>> 
>>>>>>>>>> These are great questions, and I hope I can address some of them.  First off - PWP - like any potential document format - is not aimed at solving all possible use cases, nor should it.  That said, we also realize that there is potentially a gap in what software capabilities are today and what might be needed for a high-quality PWP to function as smoothly as a PDF would today.
>>>>>>>>>> 
>>>>>>>>>> To speak to your specific case - the PDF sales report.  Using today's technology, you could export that sales report as an HTML file, attach that, and open that in your browser.  It can be archived, the local copy can only be changed by the user, etc,  What is not yet native in most browsers is the ability to have a package of HTML files.
>>>>>>>>>> 
>>>>>>>>>> For the case of a completely offline file - something more static - PWP completely allows for that, as long as the package is created referencing static files that can be grabbed when making the offline package.  That is completely within scope and a use case that has been considered. PWP does go one step further and let you have files that reference external resources.  This would let you keep data charts up-to-date, Make quick updates to color schemes, or pretty much anything else you may want to update.  This is a feature - and optional.
>>>>>>>>>> 
>>>>>>>>>> From my perspective - the goal for PWP is to create a package format that makes sense for the future.  PDF has specific use cases where it is amazing - it has had many years to be adopted and honed.  Outside of those use cases,  PWP hopes to cover many things that PDF does not do.  That doesn't mean that PDF will be useless, as I imagine businesses will be exporting sales reports in PDF for the next 10 years (the same way people are still using CSV when there is XLSX format...)  But I believe that PWP aims to be a more versatile format than PDF which is it's differentiation.
>>>>>>>>>> 
>>>>>>>>>> -Nick 
>>>>>>>>>> 
>>>>>>>>>> On Tue, Jan 19, 2016 at 7:29 AM, Craig Francis <craig.francis@gmail.com <mailto:craig.francis@gmail.com>> wrote:
>>>>>>>>>>> On 18 Jan 2016, at 20:42, Leonard Rosenthol <lrosenth@adobe.com <mailto:lrosenth@adobe.com>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> > Actually, Ivan is pointing out that an active work project - called PWP
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Hi Leonard,
>>>>>>>>>>> 
>>>>>>>>>>> And yes, good point, I completely mixed up the EUPB3 and PWP (Portable Web Publication):
>>>>>>>>>>> 
>>>>>>>>>>> http://www.w3.org/TR/pwp <http://www.w3.org/TR/pwp>
>>>>>>>>>>> 
>>>>>>>>>>> I've just read though the PWP Working Draft, and have some notes below.
>>>>>>>>>>> 
>>>>>>>>>>> In summary, I think it's a good idea, but I'm not sure it really focuses on the same problem (but please let me know if I've misunderstood).
>>>>>>>>>>> 
>>>>>>>>>>> Craig
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Just to set the tone, people like to receive PDF's for documents (e.g. sales reports) because they can be treated as an atomic document, that isn't really editable (unlike an email), and can be saved for archivable purposes (with no reliance on a website to be available to view it).
>>>>>>>>>>> 
>>>>>>>>>>> Another example is someone who sees a webpage with some useful content, and they want a copy of that content on their local computer (aka "Save Web Page as"), so that they don't need to rely on an internet connection, for the website to remain available (or being able to find the page again), or the content on that page to change.
>>>>>>>>>>> 
>>>>>>>>>>> Now there are defiantly some similarities to the problems we are trying to address, with the main focus for me being the archive format:
>>>>>>>>>>> 
>>>>>>>>>>> https://www.w3.org/TR/pwp/#package <https://www.w3.org/TR/pwp/#package>
>>>>>>>>>>> 
>>>>>>>>>>> But this seems to be a very general spec, with options to have the content unpackaged and delivered over the internet (rather than just a single file):
>>>>>>>>>>> 
>>>>>>>>>>> https://www.w3.org/TR/pwp/#state_definition <https://www.w3.org/TR/pwp/#state_definition>
>>>>>>>>>>> 
>>>>>>>>>>> In contrast, the spec seems to not really focus on being a file that can be passed around/archived (e.g. emailing a PDF), but instead a central resource which allows for copies of the document to be downloaded.
>>>>>>>>>>> 
>>>>>>>>>>> https://www.w3.org/TR/pwp/#identification <https://www.w3.org/TR/pwp/#identification>
>>>>>>>>>>> 
>>>>>>>>>>> This is useful if you want to have a central location for a document, and is kept up to date, but not so good if the primary purpose is really to have a copy that is created at one point in time, where the person who receives a copy will know that at it will stay as-is (read only).
>>>>>>>>>>> 
>>>>>>>>>>> This setup seems to be confirmed in the security section:
>>>>>>>>>>> 
>>>>>>>>>>> https://www.w3.org/TR/pwp/#security-models <https://www.w3.org/TR/pwp/#security-models>
>>>>>>>>>>> 
>>>>>>>>>>> So if I was to send a report to a manager with sales figures, they will want to open it on their mobile phone (a quick read before bedtime, I assume), then later save it to their desktop computer so they can compare it later to the next months report.
>>>>>>>>>>> 
>>>>>>>>>>> So when the Working Draft mentions things like JavaScript Service Workers:
>>>>>>>>>>> 
>>>>>>>>>>> https://www.w3.org/TR/pwp/#arch <https://www.w3.org/TR/pwp/#arch>
>>>>>>>>>>> 
>>>>>>>>>>> And the concept of these documents having the ability to do things (presumably allowing the content to change, perform tracking, etc), I don't think it's fundamentally the right approach to this problem.
>>>>>>>>>>> 
>>>>>>>>>>> But don't get me wrong, Portable Web Publications would be very good for Publications... I just don't think many businesses use PDF attachments in that way.
>>>>>>>>>>> 
>>>>>>>>>>> :-)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> > On 18 Jan 2016, at 20:42, Leonard Rosenthol <lrosenth@adobe.com <mailto:lrosenth@adobe.com>> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > Actually, Ivan is pointing out that an active work project - called PWP (Portable Web Publication - to address the need for having a better way to publish content using web technologies both in a packaged and unpackaged form.
>>>>>>>>>>> >
>>>>>>>>>>> > A solution that aligns with EPUB (but would not be EPUB 3.x as we know it today) is certainly something being serious considered by various folks as part of this work.
>>>>>>>>>>> >
>>>>>>>>>>> > Leonard
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > On 1/18/16, 12:26 PM, "Craig Francis" <craig.francis@gmail.com <mailto:craig.francis@gmail.com>> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> >> On 18 Jan 2016, at 17:13, Leonard Rosenthol <lrosenth@adobe.com <mailto:lrosenth@adobe.com>> wrote:
>>>>>>>>>>> >>> So that a user browsing PDFs on the web doesn’t need anything extra.
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> I think Ivan is suggesting that EPUB3 might do the same.
>>>>>>>>>>> >>
>>>>>>>>>>> >> I'm still not 100% convinced how well it will work (as this does depend heavily on the OS, and browsers).
>>>>>>>>>>> >>
>>>>>>>>>>> >> But in both cases (EPUB3, or using a ZIP to wrap up the HTML document+assets) most of the building blocks are already in place.
>>>>>>>>>>> >>
>>>>>>>>>>> >> Craig
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>> On 18 Jan 2016, at 17:13, Leonard Rosenthol <lrosenth@adobe.com <mailto:lrosenth@adobe.com>> wrote:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> While a PDF file does need a “reader”, it should be pointed out that EVERY MAJOR browser (Safari, Chrome, Edge, FireFox) all include PDF viewing natively.  So that a user browsing PDFs on the web doesn’t need anything extra.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Leonard
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> On 1/18/16, 11:43 AM, "Craig Francis" <craig.francis@gmail.com <mailto:craig.francis@gmail.com>> wrote:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>> On 18 Jan 2016, at 16:13, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>> Yeah. That will take time. On MacOS (starting from, I believe, Mavericks) the system comes with an epub reader, so files of this kind are automatically opened much like PDF files. Yes, it is an ebook reader on the OS, but that is not much different than using a PDF reader.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> To be incorporated into browsers is a big step (and would be a big step forward) which will need additional spec work. We are kept busy:-)
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> Good to know, and good point about PDF files needing a reader.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> If I could push the format in any way (more so how the software works), I would like to be able to send a document that is opened, read, and closed without it being imported into some kind of library.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> Maybe some ability for email clients to open the file for a "quick look" (as per the OSX term), then optionally import.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> But I realise this is going away from the idea of using this format primarily for books.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> Anyway, thanks for the heads up.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> Craig
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>> On 18 Jan 2016, at 16:13, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>> On 18 Jan 2016, at 16:58, Craig Francis <craig.francis@gmail.com <mailto:craig.francis@gmail.com>> wrote:
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>> Hi Ivan,
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>> Just to follow up on this, I've been reading the spec at:
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>> http://www.idpf.org/epub/30/spec/epub30-overview.html <http://www.idpf.org/epub/30/spec/epub30-overview.html>
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>> And it does seem pretty much what I'm after.
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>> I'm not sure I like the extra meta files, but maybe they are useful (e.g. the possibility of containing multiple HTML documents, one for each language).
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> For example. A book may also consists of many chapters each in their individual files and the order is not clear. Etc.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>>> So really the only remaining problem is getting email clients, browsers, OS'es to be able to open these files quickly/easily... rather than just automatically importing the file into an ebook reader.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Yeah. That will take time. On MacOS (starting from, I believe, Mavericks) the system comes with an epub reader, so files of this kind are automatically opened much like PDF files. Yes, it is an ebook reader on the OS, but that is not much different than using a PDF reader.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> To be incorporated into browsers is a big step (and would be a big step forward) which will need additional spec work. We are kept busy:-)
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Cheers
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Ivan
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>> Craig
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>>> On 14 Jan 2016, at 11:17, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>>> On 14 Jan 2016, at 12:05, Craig Francis <craig@craigfrancis.co.uk <mailto:craig@craigfrancis.co.uk>> wrote:
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> Thanks Ivan,
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> You are right, I normally focus more on security side of things.
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> But out of interest, EPUB3, is that likely to get the same integration as how PDFs work at the moment?
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> As in, you can email someone an EPUB3 file, and the recipient can click/tap on it to quickly view in their email client?
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> Or simply have the web browser open it, rather than needing a dedicated EPUB3 reader?
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>> In theory, all this is possible but the infrastructure is not as widespread as for PDF. Eg, you need extensions for Firefox to open an epub directly.
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> So far I've really only considered EPUB as more of a format for books (which is probably my lack of understanding of the format), so I've never really thought of its use for reports, leaflets, etc (i.e. things that PDF's tend to be used for).
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>> EPUB is perfectly capable of handling that out of the box.
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>> Ivan
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>>> In the mean time I'll have a read up on the PWP group.
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>> Craig
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>>> On 14 Jan 2016, at 10:52, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>> Craig,
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>> thanks for your note. Two comments:
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>> - The format EPUB3, defined by IDPF, already does many of what you say. On a very high level, it takes a (slightly constrained) Web site and puts it into, essentially, a zip file. For many applications, this is a worthy replacement for PDF. Note that almost all the electronic books you buy today are in EPUB3 or its predecessor...
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>> - The DPUB IG also looks further down the line on a stronger integration of digital publishing and the OWP:
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>> http://www.w3.org/TR/pwp <http://www.w3.org/TR/pwp>
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>> which may lead to significant changes in the future.
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>> Bottom line: this evolution is already happening!
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>> I understand you come more from the security area; there may be security issues with EPUB3 or PWP which we do not fully appreciate, so any comment is welcome of course!
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>> Cheers
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>> Ivan
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> On 14 Jan 2016, at 11:34, Craig Francis <craig@craigfrancis.co.uk <mailto:craig@craigfrancis.co.uk>> wrote:
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> Hi,
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> Recently I've been thinking of some of the problems with PDF's, which are useful for creating a document that can be archived, emailed, printed, etc.
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> HTML has solutions for many of PDF's problems though, for example structured text (accessibility), ability to change layout depending on screen size (no need for small screen devices to zoom into a fixed A4 layout), can change font size, better indexing support (searching for documents), etc.
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> Unfortunately you can't just email a HTML document to someone, as this causes a range of security problems, and including resources can be difficult (you can inline them, or use MHTML, but these are tricky to create).
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> So I was wondering if we could take the approach that Microsoft Word did with the docx format, Java with JAR, PHP with PHAR, etc...
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> Have a new file format, associated with the browser, which is just a ZIP/GZIP file that contains an index.html file, and everything else needed for the document.
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> Then from a security point of view, it can be locked down to its own little box, so no access to other files on the file system, probably no access to cookies/localstorage, no ability to connect to another host.
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> And from the users point of view, the document could be protected with a password (a feature that ZIP/GZIP provides already, and the browser can prompt for when opening).
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> So would this help with the security aspects of emailing HTML files to people (e.g. reports), and be better than PDFs?
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> Craig
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> ---
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0063.html <https://lists.w3.org/Archives/Public/public-webappsec/2016Jan/0063.html>
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> https://code.google.com/p/chromium/issues/detail?id=575677 <https://code.google.com/p/chromium/issues/detail?id=575677>
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> https://bugzilla.mozilla.org/show_bug.cgi?id=1237990 <https://bugzilla.mozilla.org/show_bug.cgi?id=1237990>
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>> https://wpdev.uservoice.com/forums/257854-microsoft-edge-developer/suggestions/11443002-webpage-zip-as-alternative-to-pdf <https://wpdev.uservoice.com/forums/257854-microsoft-edge-developer/suggestions/11443002-webpage-zip-as-alternative-to-pdf>
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>>
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>> ----
>>>>>>>>>>> >>>>>>>>> Ivan Herman, W3C
>>>>>>>>>>> >>>>>>>>> Digital Publishing Lead
>>>>>>>>>>> >>>>>>>>> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>>>>>>>>>>> >>>>>>>>> mobile: +31-641044153 <tel:%2B31-641044153>
>>>>>>>>>>> >>>>>>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>>
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>> ----
>>>>>>>>>>> >>>>>>> Ivan Herman, W3C
>>>>>>>>>>> >>>>>>> Digital Publishing Lead
>>>>>>>>>>> >>>>>>> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>>>>>>>>>>> >>>>>>> mobile: +31-641044153 <tel:%2B31-641044153>
>>>>>>>>>>> >>>>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>>
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>>
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> ----
>>>>>>>>>>> >>>>> Ivan Herman, W3C
>>>>>>>>>>> >>>>> Digital Publishing Lead
>>>>>>>>>>> >>>>> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>>>>>>>>>>> >>>>> mobile: +31-641044153 <tel:%2B31-641044153>
>>>>>>>>>>> >>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -- 
>>>>>>>>>> - Nick Ruffilo
>>>>>>>>>> @NickRuffilo
>>>>>>>>>> Aer.io <http://aer.io/> an INGRAM company
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> - Nick Ruffilo
>>>>>>>> @NickRuffilo
>>>>>>>> Aer.io <http://aer.io/> an INGRAM company
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> - Nick Ruffilo
>>>>>> @NickRuffilo
>>>>>> Aer.io <http://aer.io/> an INGRAM company
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> 
>>>> Bill McCoy
>>>> Executive Director
>>>> International Digital Publishing Forum (IDPF)
>>>> email: bmccoy@idpf.org <mailto:bmccoy@idpf.org>
>>>> mobile: +1 206 353 0233 <tel:%2B1%20206%20353%200233>
>>>> 
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> Bill McCoy
>>> Executive Director
>>> International Digital Publishing Forum (IDPF)
>>> email: bmccoy@idpf.org <mailto:bmccoy@idpf.org>
>>> mobile: +1 206 353 0233 <tel:%2B1%20206%20353%200233>
>>> 
>> 
>> 
>> 
>> 
>> -- 
>> 
>> Bill McCoy
>> Executive Director
>> International Digital Publishing Forum (IDPF)
>> email: bmccoy@idpf.org <mailto:bmccoy@idpf.org>
>> mobile: +1 206 353 0233 <tel:%2B1%20206%20353%200233>
>> 
> 
> 
> 
> 
> -- 
> 
> Bill McCoy
> Executive Director
> International Digital Publishing Forum (IDPF)
> email: bmccoy@idpf.org <mailto:bmccoy@idpf.org>
> mobile: +1 206 353 0233
> 

Received on Monday, 1 February 2016 21:37:45 UTC