[whatwg] Persistent storage is critically flawed. from Shannon Baker on 2006-08-28 (public-whatwg-archive@w3.org from August 2006)

From: Shannon Baker <shannon@arc.net.au>
Date: Mon, 28 Aug 2006 16:00:25 +1000
Message-ID: <44F28679.6030200@arc.net.au>
Ian Hickson wrote:
>
> This is mentioned in the "Security and privacy" section; the third
> bullet point here for example suggests blocking access to "public"
> storage areas:
>
>   http://whatwg.org/specs/web-apps/current-work/#user-tracking
>
I did read the suggestions and I know the authors have given these 
issues thought. However, my concern is that the solutions are all 
'suggestions' rather than rules. I believe the standard should be more 
definitive to eliminate the potential for browser inconsistencies.

> Yes, there's an entire section of the spec discussing this in detail,
> with suggested solutions.
>
Again, the key word here is 'suggest'.

> Indeed, the spec suggests blocking such access.
>
Suggest. See where I'm going with this. The spec is too loose.

> There generally is; but for the two cases where there are not, see:
>
>   http://whatwg.org/specs/web-apps/current-work/#storage
>
> ...and:
>
>   http://whatwg.org/specs/web-apps/current-work/#storage0
>
> Basically, for the few cases where an author doesn't control his
> subdomain space, he should be careful. But this goes without saying.
> The same requirement (that authors be responsible) applies to all Web
> technologies, for example CGI script authors must be careful not to
> allow SQL injection attacks, must check Referer headers, must ensure
> POST/GET requests are handled appropriately, and so forth.
>
As I pointed out this only gives control to the parent domain, not the 
child without regard for the real-world political relationship between 
the two. Also the implication here is that the 'parent' domain is more 
trustworthy and important than the child - that it should always be able 
to read a subdomains private user data. The spec doesn't give the 
developer a chance to be responsible when it hands out user data to 
anybody in the domain hierarchy without regard for whether they are a 
single, trusted entity or not. Don't blame the programmer when the spec 
dictates who can read and write the data with no regard for the authors 
preferences. CGI scripts generally do not have this limitation so your 
analogy is irrelevant.
 
> Indeed; users are geocities.com shouldn't be using this service, and
> geocities themselves should put their data (if any) in a private
> subdomain space.
Geocities and other free-hosting sites generally have a low server-side 
storage allowance. This means these sites have a _greater_ need for 
persistent storage than 'real' domains.

> It doesn't. The solution for mysite.geocities.com is to get their own 
> domain.
That's a bit presumptuous. In fact it's downright offensive. The user 
may have valid reasons for not buying a domain. Is it the whatcg's role 
to dictate hosting requirements in a web standard?

> The spec was written in conjunction with UA vendors. It is realistic
> for UA vendors to provide a hardcoded list of TLDs; in fact, there is
> significant work underway to create such a list (and have it be
> regualrly updated). That work was originally started for use for HTTP
> Cookie implementations, which have similar problems, but would be very
> useful for Storage API implementations (although, again as noted in
> the draft, not imperative for a secure implementation if the author is
> responsible.
I accept that such a list is probably the answer, however I believe the 
list should itself be standardised before becoming part of a web 
standard - otherwise more UA inconsistency.

> One could create much more complex APIs, naturally, but I do not see
> that this would solve the problems. It wouldn't solve the issue of
> authors who don't understand the security implications of their code,
> for instance. It also wouldn't prevent the security issue you
> mentioned -- why couldn't all *.geocities.com sites cooperate to
> violate the user's privacy? Or *.co.uk sites, for that matter? (Note
> that it is already possible today to do such tracking with cookies; in
> fact it's already possible today even without cookies if you use
> Referer tracking, and even without Referer tracking one can use IP and
> User-Agent fingerprinting combined with log analysis to perform quite
> thorough tracking.)
None of those techniques are reliable. My own weblogs show most users 
have the referer field turned off. Cookies can be safely deleted after 
every session without a major impact on site function (I may have to 
login again). IP tracking is mitigated by proxies and NAT's. The trouble 
with this proposal is that it would allow important data to get lumped 
in with tracking data when the spec suggests that UA's should only 
delete the storage when explicitly asked to do so. I don't have a 
solution to this other than to revoke this proposal or prevent the 
sharing of storage between sites. I accept tracking is inevitable but we 
shouldn't be making it easier either.

> Certainly one could add a .readonly field or some such to storage data
> items, or even fully fledged ACL APIs, but I don't think that should
> be available in a first version, and I'm not sure it's really useful
> in later versions either.
Any more or less complex or useful than the .secure flag? Readonly is an 
essential attribute in any shared data system from databases to 
filesystems. Would you advocate that all websites be world-writable just 
to simplify the API? Not that it should be hard to implement .readonly, 
as we already have metadata with each key.

> I don't really understand what this is referring to. Could you show an
> example of the transaction/callback system you refer to? The API is
> intended to be really simple, just specify the item name and there you
> go.
I'm refering to the "storage" event described in 5.9.6 which is fired in 
all active pages as data changes. This is an unusual proceedure that 
needs a better justification than those given in the spec. If the event 
pulls me out of my current function then how am I going to do anything 
useful with the application state (without really knowing where 
execution was interrupted)?

> While I agree that there are valid concerns, I believe they are all
> addressed explicitly in the spec, with suggested solutions.
You points are also quite valid however they ignore the root of my 
concerns - which is that the spec leaves too much up to the UA to 
resolve. I don't see how you can explicitly define something with a 
suggestion! The whole spec kind of 'hopes' that many disparate 
companies/groups will cooperate to make persistent storage work 
consistently across browsers. They might, but given both Microsoft and 
Netscapes track records I think things need to be more concrete in such 
an important spec.

> I would be interested in seeing a concrete proposal for a better
> solution; I don't really see what a better solution would be.

I'm not sure myself but I don't think it can stay the way it is. I would 
be happy to offer a better proposal or update the current one given 
enough time to consider it.

As a quick thought, the simplest approach might just be to require the 
site send a secret hash or public key in order to prove it 'owns' the 
key. The secret could even be a timestamp of the exact time the key was 
set or just a hash of the users site login. eg:

DOMAIN         KEY          SECRET                                 DATA
foo.bar              baz             kj43h545j34h6jk534dfytyf      A string.

Just one idea.

Shannon
Web Developer
Received on Sunday, 27 August 2006 23:00:25 UTC