Re: PDF alternative using HTML (proposal)

On 17 Jan 2016, at 10:20, Anders Rundgren <anders.rundgren.net@gmail.com> wrote:

> What exactly are the security problems you refer to?
> 
> This is a single-page document with in-line resources:
> https://cyberphone.github.io/openkeystore/resources/docs/jcs.html




Hi Anders,

If you download a HTML document to your computer (e.g. because it was sent to you in an email), it (kind of) has the ability to access any file on your computer that you have access to.

While modern browsers do have some protections in place, the HTML document can reference a private file, and potentially use XMLHttpRequest to send it to a website of the hackers choosing (ish).

It's because of this setup that pretty much all virus/spam filters will drop the email, strip the attachment, or just block the HTML file from being opened.

---

As to your example of in-line resources, that kind of works (ignoring the security problems), but they become very difficult to build (e.g. imagine 10 large inlined images on the page), with very much the same problem with MHTML (that uses the email format to include resources).

The solution I'm proposing is to allow a developer to create a folder with an index.html file, include any other resources they need in that folder (as is often done already, allowing development with a simple text editor and browser)... then when they are done, just zip it up and change the extension.

Then the OS knows to open that file in a web browser, and the browser knows to lock that document down (using many of the mechanisms that are already in place for CSP and the Suborigins[1] proposal).

---

And yes, I agree that SVG authoring with fonts still needs more work (compared with PDF), but that will come with time :-)

Craig



[1] https://w3c.github.io/webappsec-suborigins/






> On 17 Jan 2016, at 10:20, Anders Rundgren <anders.rundgren.net@gmail.com> wrote:
> 
> Hi Craig,
> 
> What exactly are the security problems you refer to?
> 
> This is a single-page document with in-line resources:
> https://cyberphone.github.io/openkeystore/resources/docs/jcs.html
> 
> For me it is really SVG authoring that present the biggest problem.
> It awfully hard to get near PDF in rendering using fonts.
> I convert everything to paths :-(
> 
> Disclaimer: I'm not a HTML, CSS, or SVG expert...
> 
> Cheers
> Anders
> 
> On 2016-01-17 11:07, Craig Francis wrote:
>> Thanks Crispin, but if you have looked at the docx standard, it really is very difficult to work with.
>> 
>> I was hoping to take the HTML/CSS that we all know and love, and package it into a single file using a technology that we also already know and love, and get the browsers to display it in a way we are all familiar with, in a nice secure way (where the security part of this is the bit that would need most discussion).
>> 
>> That said, Ivan at the Digital Publishing IG believes the EPUP3 standard is the answer, which I need to look at again, but I feel that's falling into the same trap of just being an overly complicated solution for what most developers want (good for ebooks though).
>> 
>> Craig
>> 
>> 
>> 
>>> On 17 Jan 2016, at 06:33, Crispin Cowan <crispin@microsoft.com> wrote:
>>> 
>>> Just FYI, Microsoft .docx is a standard called Open XML https://en.wikipedia.org/wiki/Office_Open_XML
>>> 
>>> So if you want to take the approach that Office did, then done!
>>> 
>>> -----Original Message-----
>>> From: Craig Francis [mailto:craig@craigfrancis.co.uk]
>>> Sent: Thursday, January 14, 2016 2:40 AM
>>> To: Wendy Seltzer <wseltzer@w3.org>
>>> Cc: Adrian Hope-Bailie <adrian@hopebailie.com>; public-webappsec@w3.org
>>> Subject: Re: PDF alternative using HTML (proposal)
>>> 
>>> Thanks Wendy,
>>> 
>>> I must confess I didn't look at the other Groups, but have just posted (after trying to get used to the volume of emails in that group).
>>> 
>>> The reason I started the post here was because the current alternatives (HTML with inline resources, or MHTML) already exist, and fail completely at security, so I'm hoping this solution will focus on that.
>>> 
>>> Craig
>>> 
>>> 
>>> https://lists.w3.org/Archives/Public/public-digipub-ig/2016Jan/0089.html
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On 12 Jan 2016, at 14:14, Wendy Seltzer <wseltzer@w3.org> wrote:
>>>> 
>>>> Hi Craig and Adrian,
>>>> 
>>>> You may want to bring this discussion to the Digital Publishing IG,
>>>> https://www.w3.org/dpub/IG/wiki/Main_Page
>>>> 
>>>> While the security considerations of packaged documents could be
>>>> in-scope for WebAppSec, the PDF alternative use cases are probably
>>>> best developed elsewhere.
>>>> 
>>>> --Wendy
>>>> 
>>>>> On 01/12/2016 07:06 AM, Craig Francis wrote:
>>>>> From a web developers point of view, my replies are below...
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 12 Jan 2016, at 11:33, Adrian Hope-Bailie <adrian@hopebailie.com> wrote:
>>>>>> 
>>>>>> +1 - seems like something worth standardizing if browsers will standardize the security model that is applied to this browsing context.
>>>>>> 
>>>>>> Assumptions:
>>>>>> ALL embedded resources would be packaged in the archive The script
>>>>>> execution capabilities of this app would be severely limited (no network requests for example).
>>>>> 
>>>>> 
>>>>> Yes to both, I think security/privacy is very important here.
>>>>> 
>>>>> If we start having documents that start reporting on when they are being opened (e.g. via JS or remote image), then people will probably avoid these documents (it needs to be better than PDF in this regard).
>>>>> 
>>>>> 
>>>>>> Observations:
>>>>>> 
>>>>>> "ability to change layout depending on screen size" means embedding resources for all supported screen sizes in the archive - how big could this archive get? Would be useful to try a few examples and see.
>>>>> 
>>>>> 
>>>>> If you are providing images (or dare I say videos), then this may increase the file size a bit, but it's an extra feature that can be used (and probably only in rare cases, like a badly imported image into a PDF).
>>>>> 
>>>>> Generally the strength of HTML/CSS is that it's text, so if anything the file size will probably be very good for the typical document.
>>>>> 
>>>>> 
>>>>>> I can see the tooling for this becoming quite powerful and ultimatley allowing you to produce documents and slide decks that are far superior to those from existing proprietary formats.
>>>>> 
>>>>> 
>>>>> I think building of these documents would be excellent.
>>>>> 
>>>>> Developers could create a folder with index.html and style.css files, maybe some images, test locally, then zip up the folder and change the extension (the manual approach, but it works).
>>>>> 
>>>>> Users could also visit a website and do a "save page as" and not have to worry about missing images/resources (either because they only saved the HTML, or because the resources are typically put into a separate folder).
>>>>> 
>>>>> And systems that create documents, well they often use HTML to PDF generators already, and they are all pretty bad from my experience.
>>>>> 
>>>>> 
>>>>>> I would imagine that if I opened the file /tmp/html-document.hta it
>>>>>> would open in my browser and the address bar would show file:///temp/html-document.hta Can I browse to other HTML files in the archive? And if so what is their URL?
>>>>>> E.g. Would the file example/otherfile.html inside the archive be at the URL file:///temp/html-document.hta/example/otherfile.html ?
>>>>> 
>>>>> 
>>>>> Personally I wouldn't be using multiple HTML files (I'm currently creating reports that are exported as PDF's, which don't have this ability)... but I don't see why that feature couldn't be included.
>>>>> 
>>>>> I like the idea of just appending onto the base path.
>>>>> 
>>>>> The HTML files themselves can then just do a <a href="../../example/otherfile.html"> to help during development/testing, or just use <a href="/example/otherfile.html">.
>>>>> 
>>>>> 
>>>>>> I stole the .hta extension from Microsoft's HTML Applications (https://en.wikipedia.org/wiki/HTML_Application <https://en.wikipedia.org/wiki/HTML_Application>).
>>>>>> Similar idea with the opposite security principles and very little
>>>>>> success as far as I know
>>>>> 
>>>>> I found that someone else was proposing a "hdoc" extension:
>>>>> 
>>>>> http://hdoc.crzt.fr/www/co/hdoc.html
>>>>> <http://hdoc.crzt.fr/www/co/hdoc.html>
>>>>> 
>>>>> Although I think their proposal went a bit far including several meta files which I don't think are needed (just have the requirement of one index.html file).
>>>>> 
>>>>> Personally I don't think it matters which extension we choose :-)
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> On 12 January 2016 at 12:54, Craig Francis <craig@craigfrancis.co.uk <mailto:craig@craigfrancis.co.uk>> wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> Recently I've been thinking of some of the problems with PDF's, which are useful for creating a document that can be archived, emailed, printed, etc.
>>>>>> 
>>>>>> HTML has solutions for many of PDF's problems though, for example structured text (accessibility), ability to change layout depending on screen size (no need for small screen devices to zoom into a fixed A4 layout), can change font size, better indexing support (searching for documents), etc.
>>>>>> 
>>>>>> Unfortunately you can't just email a HTML document to someone, as this causes a range of security problems, and including resources can be difficult (you can inline them, or use MHTML, but these are tricky to create).
>>>>>> 
>>>>>> So I was wondering if we could take the approach that Microsoft Word did with the docx format, Java with JAR, PHP with PHAR, etc...
>>>>>> 
>>>>>> Have a new file format, associated with the browser, which is just a ZIP/GZIP file that contains an index.html file, and everything else needed for the document.
>>>>>> 
>>>>>> Then from a security point of view, it can be locked down to its own little box, so no access to other files on the file system, probably no access to cookies/localstorage, no ability to connect to another host (maybe).
>>>>>> 
>>>>>> And from the users point of view, the document could be protected with a password (a feature that ZIP/GZIP provides already, and the browser can prompt for when opening).
>>>>>> 
>>>>>> So would this help with the security aspects of emailing HTML files to people (e.g. reports), and be better than PDFs?
>>>>>> 
>>>>>> Craig
>>>>>> 
>>>>>> 
>>>>>> https://code.google.com/p/chromium/issues/detail?id=575677
>>>>>> <https://code.google.com/p/chromium/issues/detail?id=575677>
>>>>>> 
>>>>>> https://bugzilla.mozilla.org/show_bug.cgi?id=1237990
>>>>>> <https://bugzilla.mozilla.org/show_bug.cgi?id=1237990>
>>>> 
>>>> 
>>>> --
>>>> Wendy Seltzer -- wseltzer@w3.org +1.617.715.4883 (office) Policy
>>>> Counsel and Domain Lead, World Wide Web Consortium (W3C)
>>>> http://wendy.seltzer.org/        +1.617.863.0613 (mobile)
>>> 
>>> 
>> 
> 

Received on Monday, 18 January 2016 11:40:39 UTC