W3C home > Mailing lists > Public > whatwg@whatwg.org > August 2010

[whatwg] HTML resource packages

From: Philip Taylor <excors+whatwg@gmail.com>
Date: Wed, 4 Aug 2010 13:26:02 +0100
Message-ID: <AANLkTik=_70_UXtbpoREiGf44-senVQp+R7EYvMS9LU5@mail.gmail.com>
On Wed, Aug 4, 2010 at 1:31 AM, Justin Lebar <justin.lebar at gmail.com> wrote:
> We at Mozilla are hoping to ship HTML resource packages in Firefox 4,
> and we wanted to get the WhatWG's feedback on the feature.
> For the impatient, the spec is here:
> ? ?http://people.mozilla.org/~jlebar/respkg/

It seems a bit surprising that [pkg.zip img1.png img2.png] provides
more files than [pkg.zip img1.png] but *fewer* files than [pkg.zip]
(which includes all files). I can imagine people would write code

  print "<html packages='[cached-image-thumbnails.zip " . (join " ",
@thumbnails_which_are_not_out_of_date) . "]'>";

(intending the package to be updated infrequently, and used only for
images that haven't been modified since the last package update), and
they would get completely the wrong behaviour when the list is empty.
So maybe "[pkg.zip]" should mean no files (vs "pkg.zip" which still
means all files).

Filenames in zips are byte-strings, not Unicode-character-strings.
What should happen with non-ASCII in the zip's list of contents?
People will use standard zip programs and frequently end up with
various random character encodings in their file - would browsers
guess or decode as CP437 or decode as UTF-8 or fail? would they look
at the zip header's language encoding flag? etc.

What happens if the document contains multiple <html> elements (not
all the root element)? (e.g. if it's XHTML, or the elements are added
by scripts). The packages spec seems to assume there is only ever one.

The note at the end of 4.1 seems to be about avoiding problems like
http://evil.com/ saying:

    <html packages="eviloverride.zip"> <!-- gets downloaded from evil.com -->
    <base href="http://bank.com/">
    <img src="http://bank.com/logo.png"> <!-- this shouldn't be
allowed to come from the .zip -->

Why is this particular example an important problem? If the attacker
wants to insert their own files into their own pages, they can just do
it directly without using packages. Since this is (I assume) only used
for resources like images and scripts and stylesheets, and not for <a
href>s or <iframe href>s, I don't see how it would let the attacker
circumvent any same-origin restrictions or do anything else dangerous.

The opposite way seems more dangerous, where evil.com says:

    <html packages="http://evil.com/redirect.cgi?http://secret-bank-intranet-server/packages.zip">
    <img src="http://evil.com/logo.png">
    <!-- now use canvas to read the pixel data of the secret logo,
since it was loaded from the evil.com origin -->

Is anything stopping that?

In 4.3 step 2: What is pkg-url initialised to? (The package href of p?)

Philip Taylor
excors at gmail.com
Received on Wednesday, 4 August 2010 05:26:02 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:08:59 UTC