[w3c/IndexedDB] Security: Add note about UAs needing to handle version skew (#139)

Security concern, raised by Jochen Eisinger: (paraphrased) If a compiled WASM module is stored to Indexed DB, what are the requirements around a UA that has updated?

In general, it is the case that a UA needs to be concerned with version skew. We should add text like the following:

....

In practical implementations of Indexed DB the data will be persisted from memory to a non-volatile storage medium. When data is stored it will be serialized, and when retrieved it will be deserialized. The details of the serialization/deserialization that are user-agent specific and outside the bounds of this specification, as long as there is no observable behavior, i.e. the algorithms from "safe passing of structured data" [HTML] are followed. User agents may change their internal serialization format over time; for example to handle new data types or improve performance. To satisfy the requirement that data which can be stored in Indexed DB can later be retrieved, this requires user agents to handle older serialization formats; for example, performing a migration of the entire database, or having the deserialization implementation able to handle old data. 

If done improperly, this can result in security issues. For example, if a data type is serialized which includes data trusted by the user agent, upon deserialization by a later version of the user agent the same data may no longer be trustable because additional requirements have been added by the user agent. Similarly, a type could include an internal representation which is no longer compatible with the user agent. 

A practical example of this is the RegExp type. The [StructuredClone](https://html.spec.whatwg.org/multipage/infrastructure.html#structuredclone) operation allows cloning ECMAScript [RegExp](http://www.ecma-international.org/ecma-262/7.0/index.html#sec-regexp-regular-expression-objects) objects. A typical user agent will accept a regular expression from user script and compile it into an internal representation in native machine instructions which can be executed directly on the CPU, with assumptions about how the input data is passed and results returned. If this internal representation was serialized as part of the data stored to the database, various problems could arise when the internal representation was later deserialized. For example, the means by which data was passed into the code could have changed. Security bugs in how the expression was compiled could have changed. It is also possible that the native instructions available may have been changed due to hardware modifications on the user's machine.

User agents must therefore take extreme care when persisting such internal state, such as including version identifiers in their serialization formats. User agents must be prepared to reconstruct such internal state from script-visible state if incompatibilities are detected, to uphold the requirement that data which is stored can later be retried.

....

Does that capture it?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/w3c/IndexedDB/issues/139

Received on Friday, 20 January 2017 18:27:48 UTC