The Structured Clone Wars

At the thread "LazyReadCopy experiment and invariant checking for
[[Extensible]]=false" on es-discuss,
On Wed, Jul 13, 2011 at 10:29 AM, David Bruant <david.bruant@labri.fr>wrote:

> Hi,
>
> Recently, I've been thinking about the structured clone algorithm used in
> postMessage
>

Along with Dave Herman <
http://blog.mozilla.com/dherman/2011/05/25/im-worried-about-structured-clone/>,
I'm worried about structure clone <
http://www.w3.org/TR/html5/common-dom-interfaces.html#safe-passing-of-structured-data>.
In order to understand it better before criticizing it, I tried implementing
it in ES5 + WeakMaps. My code appears below. In writing it, I noticed some
ambiguities in the spec, so I implemented my best guess about what the spec
intended.

Aside: Coding this so that it is successfully defensive against changes to
primordial bindings proved surprisingly tricky, and the resulting coding
patterns quite unpleasant. See the explanatory comment early in the code
below. Separately, we should think about how to better support defensive
programming for code that must operate in the face of mutable primordials.

Ambiguities:

1) When the says "If input is an Object object", I assumed it meant 'if the
input's [[Class]] is "Object" '.
2) By "for each enumerable property in input" combined with "Note: This does
not walk the prototype chain.", I assume it meant "for each enumerable own
property of input".
3) By "the value of the property" combined with "Property descriptors,
setters, getters, and analogous features are not copied in this process.", I
assume it meant "the result of calling the [[Get]] internal method of input
with the property name", even if the enumerable own property is an accessor
property.
4) By "corresponding to the same underlying data", I assume it meant to
imply direct sharing of read/write access, leading to shared state
concurrency between otherwise shared-nothing event loops.

Are the above interpretations correct?

Given the access to shared mutability implied by #4, I'm wondering why
MessagePorts are passed separately, rather than simply being other special
case like File in the structured clone algorithm.

I've been advising people to avoid the structured clone algorithm, and send
only JSON serializations + MessagePorts through postMessage. It's unclear to
me why structured clone wasn't instead defined to be more equivalent to
JSON, or to a well chosen subset of JSON. Given that they're going to
co-exist, it behooves us to understand their differences better, so that we
know when to advise JSON serialization/unserialization around postMessage
vs. just using structured clone directly.

There are here a fixed set of data types recognized as special cases by this
algorithm. Unlike JSON, there are no extension points for a user-defined
abstraction to cause its own instances to effectively be cloned, with
behavior, across the boundary. But neither do we gain the advantage of
avoiding calls to user code interleaved with the structured clone algorithm,
if my resolution of #3 is correct, since these [[Get]] calls can call
getters.

In ES6 we intend to reform [[Class]]. Allen's ES6 draft <
http://wiki.ecmascript.org/doku.php?id=harmony:specification_drafts> makes a
valiant start at this. How would we revise structured clone to account for
[[Class]] reform?

And finally there's the issue raised by David on the es-discuss thread: What
should the structured clone algorithm do when encountering a proxy? The
algorithm as coded below will successfully "clone" proxies, for some meaning
of clone. Is that the clone behavior we wish for proxies?


------------- sclone.js ------------------------

var sclone;

(function () {
   "use strict";

   // The following initializations are assumed to capture initial
   // bindings, so that sclone is insensitive to changes to these
   // bindings between the creation of the sclone function and calls
   // to it. Note that {@code call.bind} is only called here during
   // initialization, so we are insensitive to whether this changes to
   // something other than the original Function.prototype.bind after
   // initialization.

   var Obj = Object;
   var WM = WeakMap;
   var Bool = Boolean;
   var Num = Number;
   var Str = String;
   var Dat = Date;
   var RE = RegExp;
   var Err = Error;
   var TypeErr = TypeError;

   var call = Function.prototype.call;

   var getValue = call.bind(WeakMap.prototype.get);
   var setValue = call.bind(WeakMap.prototype.set);

   var getClassRE = (/\[object (.*)\]/);
   var exec = call.bind(RegExp.prototype.exec);
   var toClassString = call.bind(Object.prototype.toString);
   function getClass(obj) {
     return exec(getClassRE, toClassString(obj))[1];
   }

   var valueOfBoolean = call.bind(Boolean.prototype.valueOf);
   var valueOfNumber = call.bind(Number.prototype.valueOf);
   var valueOfString = call.bind(String.prototype.valueOf);
   var valueOfDate = call.bind(Date.prototype.valueOf);

   var keys = Object.keys;
   var forEach = call.bind(Array.prototype.forEach);

   var defProp = Object.defineProperty;

   // Below this line, we should no longer be sensitive to the current
   // bindings of built-in services we rely on.

   sclone = function(input) {

     function recur(input, memory) {
       if (input !== Obj(input)) { return input; }
       var output = getValue(memory, input);
       if (output) { return output; }

       var klass = getClass(input);
       switch (klass) {
         case 'Boolean': {
           output = new Bool(valueOfBoolean(input));
           break;
         }
         case 'Number': {
           output = new Num(valueOfNumber(input));
           break;
         }
         case 'String': {
           output = new Str(valueOfString(input));
           break;
         }
         case 'Date': {
           output = new Dat(valueOfDate(input));
           break;
         }
         case 'RegExp': {
           var flags = (input.global ? 'g' : '') +
                       (input.ignoreCase ? 'i' : '') +
                       (input.multiline ? 'm' : '');
           output = new RE(input.source, flags);
           break;
         }
         case 'ImageData':
         case 'File':
         case 'Blob':
         case 'FileList': {
           // TODO: implement
           throw new Err('not yet implemented');
           break;
         }
         case 'Array': {
           output = [];
           break;
         }
         case 'Object': {
           output = {};
           break;
         }
         default: {
           throw new TypeErr('Should be DOMException(DATA_CLONE_ERR)');
           break;
         }
       }
       setValue(memory, input, output);

       if (klass === 'Object' || klass === 'Array') {
         forEach(keys(input), function(name) {
           defProp(output, name, {
             value: recur(input[name], memory),
             writable: true,
             enumerable: true,
             configurable: true
           });
         });
       }
       return output;
     }

     return recur(input, WM());
   };
 })();

Received on Thursday, 14 July 2011 19:47:25 UTC