Re: Security Use Cases - Very rough first draft from Baldur Bjarnason on 2016-08-22 (public-digipub-ig@w3.org from August 2016)

From: Baldur Bjarnason <baldur@rebus.foundation>
Date: Mon, 22 Aug 2016 03:07:54 +0000
To: Leonard Rosenthol <lrosenth@adobe.com>
Cc: Ivan Herman <ivan@w3.org>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-Id: <2792B3B6-AEA2-4346-B73B-DB782BF90EE2@rebus.foundation>
Based on the feedback I’ve received on the draft (which was a lot of help, thanks!) I think we can boil the use cases section down to a simple list of four:

1. User Agents must be allowed to limit the capabilities of Portable Web Publications to preserve the security and safety of the user. Examples include security-conscious reading systems (such as browsers) and web-based archiving and reading systems (e.g. a student uploading an annotated document to a learning management web service for a teacher to review).

2. PWP authors should be able to discover what capabilities their documents have access to. Possibilities include, but are not limited to: a permissions API, standardising an "always available" safe subset of the platform's capabilities, or documenting specific methods progressive enhancement or graceful degradation that the author should be able to use to safely create their documents.

3. PWP authors should be able to embed guidance policies in their documents that inform the User Agent of their preferences as to how the integrity and security of the document itself should be preserved. This mechanism should be based on the pre-existing Content Security Policy specifications.

4. User Agents may provide a method for escalating trust so that publications the UA otherwise would have limited for security reasons may gain access to more capabilities. Examples include the code signing systems for packaged web apps and extensions.

From what I understand the Use Cases section shouldn't have a discussion of their context or of their feasibility, at least, not to the extent that my half-baked first draft did.

Now, I don't think use case number 4 is at all likely to ever come to pass but it is a valid use case.

I hope that if I rewrite the draft to be a less value-laden description of the four above use cases—with less context and opinion—that is going to be much less controversial among list members.

## On PDFs as a security role model

The problem with referring to PDFs as a solution to the PWP security model issue is that, as far as browsers are concerned, the built-in PDF viewers in Safari, Chrome, and Mozilla only support a subset of the format's JavaScript capabilities. 

Safari and Chrome support a subset of the AcroForms JS API. Mozilla's PDF.js is even more limited even though that viewer is _built_ using JavaScript. When Microsoft switched to their own PDF rendering library in Edge they for the most part seem to have just turned off PDF form filling (the most common use of JS in PDFs).

So if we are at all concerned with browser adoption, browser vendors have pretty consistently voted in favour of sacrificing the dynamic capabilities of local portable documents in the name of security—whether those documents are PDFs or HTML files from the local filesystem.

As with local HTML files, browser vendors _arrived_ at this model after a lot of pain and heartache. When combined, the Acrobat Plugin, NPAPI, and various JavaScript APIs were, for a variety of reasons, a technical and security disaster. 

Despite the limitations imposed now on PDFs when viewed in-browser, the current state of affairs is a substantial improvement to the old plugin system. The built-in PDF viewing components are, on the whole, performant,  well tested, and battle-proven. At least two of them are open source (Chrome's PDFium, which was initially based on Foxit as Leonard pointed out, and Mozilla's PDF.js). The other two (Apple's and MS's) are OS-level components that have received extensive testing among security researchers.

In the browser, you get basic PDF features but if you run into a complex interactive document or one with dynamic forms, you have to open it up in a dedicated PDF app to be able to use it. As far as security compromises go for an interactive portable document format that strikes a balance many will find acceptable. But it does involve accepting that the browser is, at best, a secondary platform for the format. Going with the PDF model for Portable Web Publications would probably mean that—as with PDFs—browsers will only support the format well enough to preview but not enough to be first class reading systems.

## Where do we strike a balance?

If we are to require that Portable Web Publications support the full scope of the web platform's APIs even in a local, untrusted context, then that requires us to specify from scratch a new packaged app platform with a new associated security model. As Bill McCoy pointed out earlier, this has been tried before at the W3C with little success.

Designing a platform from scratch and making it secure is a proper PhD-in-Computer-Science-level research problem. It’s a lot of work and requires very specialised expertise.

Which is to say is that the decisions we make here decide what resources we can bring to bear on specifying Portable Web Publications.

If we decide to support all of the features on everybody's checklist, the complexity of the format will skyrocket and this group will need more support from people with very specialised technical expertise to do the job properly. But as the complexity of the format increases, maintaining compatibility with the web platform will become harder, the difficulty of implementing it will increase, and overall tech industry interest will decrease.

Which means that the more complex problems we decide to tackle (and this security one is a doozy!) the more help we will need and the less likely we are to get it.

If the goal is to create a format that's universal and is widely adopted among both browser vendors and the web community, then the tactic likeliest to succeed is to make the format very simple and give it few features. Start with a simple system; add features in future iterations. But that might scare off those who need specific features for their business today and _also_ lead to less help and fewer resources.

Neither path is perfect but I personally favour the second one (start simple; add features later).

Even if going with a 'lite' format for version one loses us buy-in from many industries and companies, a simpler format needs less work to specify and is more likely to be widely adopted. Being more widely adopted in turn would lead to more resources that could help us tackle the really thorny problems in future iterations. Playing a longer game maximises our chances and makes it more likely that Portable Web Publications will actually become a reality.

P.S. This here piece of news is relevant to this discussion:

http://blog.chromium.org/2016/08/from-chrome-apps-to-web.html?m=1

> "We will be removing support for packaged and hosted apps from Chrome on Windows, Mac, and Linux over the next two years"


- best
- Baldur Bjarnason
  baldur@rebus.foundation



> On 22 Aug 2016, at 01:24, Leonard Rosenthol <lrosenth@adobe.com> wrote:
> 
> While I appreciate that Bill +1’d me on my P2P use case – it’s a bit problematic that he then went on to spread a whole lot of INCORRECT FUD (fear, uncertainty and doubt) about the PDF technology and Adobe’s implementation of same.
> Bill – while you certain had a significant hand in the early days of the technology, you haven’t been part of Adobe or the development of PDF for 6 years now.  Things have moved beyond what you knew then and I would appreciate you sticking to things that you have an up to date knowledge source for.   And since isn’t the place for me to correct them, I will leave them for now. (though if anyone wishes to reach out to me in private, I am happy to answer).
> Now – back to the question/issue at hand – security in PWP.
> There is certainly no question that content with JavaScript (either included or referenced) has a greater security risk than content without it – it is actually just one aspect of issues that need to be considered around PWP security.  Even documents that have ZERO scripting can be security risks if they have external references – even to CSS or fonts or other types of content.  For that matter, as we’ve seen actually in the field, even a document that has all resources full embedded can be a security risk if an attacker is able to incorporate a malicious binary, such as a font.
> > I believe that today's EPUB 3 solutions built on OWP with its more well-defined security model and predominantly open source implementations,
> >is very arguably more secure than PDF, or at least there's no clear evidence that it's less secure.
> > 
> I would put forth two comments here:
> 1 – EPUB has such a limited distribution and use in the world today that hackers don’t care about it.  Assuming that they did find one (or more) attacks using EPUB – the number of impacted users would be so small as to not make their investment worthwhile.  Hacking today is really a business with a standard ROI model.  It’s why the (“Big 3”) browsers as well as Adobe products have historically be the best targets – we have the widest distribution.   You never see attacks against Opera – because there is no ROI.  And only recently have they gone after companies such as FoxIt – and mostly because Chrome uses their engine.
> 2 – EPUB’s current “no scripting” and “fully self-contained” model have indeed served it well to be more secure than other solutions.  However, as we move away from those limitations to the areas that PWP brings us, it also means that those fences/walls will no longer be there to hold back potential attacks.  And worse, every attack vector available on the web is now also one for your publications.
>  
>  
> Leonard
>  
>  
> From: Bill McCoy <bmccoy@idpf.org>
> Date: Saturday, August 20, 2016 at 2:03 PM
> To: Ivan Herman <ivan@w3.org>
> Cc: Bill McCoy <whmccoy@gmail.com>, Baldur Bjarnason <baldur@rebus.foundation>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
> Subject: Re: Security Use Cases - Very rough first draft
> Resent-From: <public-digipub-ig@w3.org>
> Resent-Date: Saturday, August 20, 2016 at 2:04 PM
>  
> Even though it was not specifically a reply to my comment I want to +1 Ivan's take on this.
>  
> And I also want to +1 something Leonard said about ad hoc (P2P) use cases for sharing documents also being important, and to tease out an implication of that.
>  
> PDF is of course the prevalent portable document format today, especially for P2P. And PDFs can contain active content - JavaScript - just like EPUBs and Web pages - and arguably that active content represents a far more dangerous security risk:
>  
> - 99% of PDF rendering (gross estimate) is done via closed source solutions that aren't subject to objective security analysis or security vulnerability correction by third parties (unlike 3 out of the 4 top browser, including the browser engines built-in to both top mobile OS's)
>  
> - the most widely used PDF implementation, Adobe Reader, has 3 different instantiations of JavaScript, each with their own set of unique APIs (at one point in the past 4 instantiations and at that time all different versions of the JS interpreter!)... these are for general JS code in the main context, for JS used with XML Forms Architecture (XFA) content, and JS used with 3D content. 
>  
> - the PDF specifications do not formally define the document execution life cycle or security model of executing script code in a rigorous way. For example while in Adobe Reader some (but not all) cases of remote network access from scripting in PDF produce user-visible warnings the behavior depends on preference settings which are application implementation details, not part of the PDF specification itself.
>  
> - in particular in PDF there is no model of encapsulation of less trusted content inside a container of more trusted content, everything operates at the same (implicit) trust level.. Whereas for OWP, IFRAME et. al. provide an encapsulation model (further elaborated in EPUB with Embedded Scriptable Components).
>  
> - https://www.cvedetails.com/vulnerability-list/vendor_id-53/product_id-497/Adobe-Acrobat-Reader.html (enough said)
>  
> I believe that today's EPUB 3 solutions built on OWP with its more well-defined security model and predominantly open source implementations, is very arguably more secure than PDF, or at least there's no clear evidence that it's less secure. Yet we very rarely hear publishers or end users distributing documents being seriously worried about scripting in PDFs (even if, objectively, they should be).
>  
> Therefore I believe part of this is a marketing issue not a technical issue. As more and more of the online Web seems ridden with sketchy ads and malware, commercial publishers fear similar things happening with their premium paid content. That is not necessarily an irrational fear and I support that we should be taking security very seriously, but since at a technical level PDF sets a very low bar for PWP, which EPUB 3 may already have exceeded, we shouldn't presume that we have a technical disaster already on our hands,
>  
> --Bill
>  
>  
> On Sat, Aug 20, 2016 at 12:22 AM, Ivan Herman <ivan@w3.org> wrote:
>> (This not a reply on this mail specifically but rather on the  resulting thread… the result of being in a different timezone:-)
>>  
>> I think that, at this point and mainly for the purpose of the UCR document, what we have to concentrate on is what extra security requirements PWP-s have over the general Web security aspect. I realize that Baldur's use cases are in this direction (thanks Baldur:-), but I also agree that, at this point, we should not, in this document, go into specific technical solutions; this is not the goal of the UCR (but will be the topic of a future Working Group if and when we get there).
>>  
>> To put it another way: when I talk to my Web Application friends, I do tell them that publishers are more nervous about security than many other content providers on the Web, they are nervous of using Javascript for the these reasons, etc. I think these high(er) level fears and concerns, and their reasons, should be clearly spelled out in this document (and, for example, these are related to issues in this document, like the problem of origin; I guess it also goes for the integrity of the publication as a whole that gets copied around but is also, at the same time, possibly copyrighted, etc). Web Application people should understand that there is a community out there that may be more stringent in this sense. At the same time, Publishing people should understand that the PWP community does take these concerns seriously and it is not the intention of ignore them in a big and happy kumbaya with Web technologies. This is what the UCR document should reflect, without getting into the technical weeds…
>>  
>> My 2 cents…
>>  
>> Ivan 
>>  
>>  
>>  
>>> On 19 Aug 2016, at 18:13, Bill McCoy <whmccoy@gmail.com> wrote:
>>>  
>>> Most if not all of these requirements do not seem to be  specific to "Web Publications" as the term is defined by DPUB IG. 
>>>  
>>> It is of course true that publications must not compromise the basic security model of the Web. 
>>> 
>>> Unfortunately, the definition of that general security model and the associated runtime life cycle isn't entirely clear, especially when it comes to content and applications stored on / executing from local systems.  And I'm not sure it's the job of DPUB IG to attempt to define with precision that general model. Or, if we do take on the job of fully defining that security model, we should realize we aren't doing it just for "Publications" but really for Web content in general.
>>>  
>>> https://www.w3.org/TR/runtime/ is for example recent work in this area started by the now defunct System Applications WG. Some  of this seems very applicable to Web Publications. That it's unfinished orphaned work is perhaps a warning sign that it may not be an easy job to take on but perhaps someone could adopt it (which may be preferable to starting over). Whether that's DPUB IG or a successor vs. say the Web Platform WG is another question... and I guess to me this is all logically part of the Web Platform itself.
>>>  
>>> EPUB specifications to date have clearly punted on this but one reason was that we were hoping that work on Web Applications at W3C would be paving the way in terms of more rigorously defining the Web security model especially for offline/local content. 
>>>  
>>> --Bill 
>>>  
>>>  
>>> On Fri, Aug 19, 2016 at 5:34 AM, Baldur Bjarnason <baldur@rebus.foundation> wrote:
>>>> Security Use Cases - Very rough first draft
>>>> 
>>>> Here it is on Google Docs:
>>>> 
>>>> https://docs.google.com/document/d/1i8vm8cg5iqxWgpPFRR3Qae5loj-DWcrsbBUIf2IeGaU/edit?usp=sharing
>>>> 
>>>> Let me know if you can’t access it and I’ll find another way to share it with the list or fiddle with the sharing settings on the document itself.
>>>> 
>>>> It’s a very rough draft, half-baked, doesn’t conform to spec style or structure etc. etc.
>>>> 
>>>> All of the links included are there more as informative references for context and will have to be turned into proper spec references or removed in a later draft.
>>>> 
>>>> If the scenarios seem paranoid downers then bear in mind that my biggest worry while writing it is that I might not be paranoid enough.
>>>> 
>>>> - best
>>>> - Baldur Bjarnason
>>>>   baldur@rebus.foundation
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>  
>>  
>> 
>> ----
>> Ivan Herman, W3C 
>> Digital Publishing Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>> 
>> 
>> 
>>  
> 
> 
>  
> -- 
>  
> Bill McCoy
> Executive Director
> International Digital Publishing Forum (IDPF)
> email: bmccoy@idpf.org
> mobile: +1 206 353 0233
>
Received on Monday, 22 August 2016 03:08:26 UTC