W3C home > Mailing lists > Public > whatwg@whatwg.org > August 2013

Re: [whatwg] asynchronous JSON.parse and sending large structured data between threads without compromising responsiveness

From: Ian Hickson <ian@hixie.ch>
Date: Tue, 6 Aug 2013 21:58:24 +0000 (UTC)
To: whatwg@whatwg.org
Message-ID: <alpine.DEB.2.00.1308062133290.9685@ps20323.dreamhostps.com>
On Thu, 7 Mar 2013, j@mailb.org wrote:
>
> right now JSON.parse blocks the mainloop, this gets more and more of an 
> issue as JSON documents get bigger and are also used as serialization 
> format to communicate with web workers.

I think it would make sense to have a Promise-based API for JSON parsing. 
This probably belongs either in the JS spec or the DOM spec; Anne, Ms2ger, 
and any JS people, is anyone interested in taking this?


On Thu, 7 Mar 2013, David Rajchenbach-Teller wrote:
> 
> Actually, communicating large JSON objects between threads may cause 
> performance issues. I do not have the means to measure reception speed 
> simply (which would be used to implement asynchronous JSON.parse), but 
> it is easy to measure main thread blocks caused by sending (which would 
> be used to implement asynchronous JSON.stringify).

I don't understand why there'd be any difficulty in sending large objects 
between workers or from a worker to the main thread. It's possible this is 
not well-implemented today, but isn't that just an implementation detail?

One could imagine an implementation strategy where the cloning is done on 
the sending side, or even on a third thread altogether, and just passed 
straight to the receiving side in one go.


On Thu, 7 Mar 2013, Tobie Langel wrote:
> 
> Even if an async API for JSON existed, wouldn't the perf bottleneck then 
> simply fall on whatever processing needs to be done afterwards?

That was my initial reaction as well, I must admit.


On Fri, 8 Mar 2013, David Rajchenbach-Teller wrote:
>
> For the moment, the main use case I see is for asynchronous
> serialization of JSON is that of snapshoting the world without stopping
> it, for backup purposes, e.g.:
> a. saving the state of the current region in an open world RPG;
> b. saving the state of an ongoing physics simulation;
> c. saving the state of the browser itself in case of crash/power loss
> (that's assuming a FirefoxOS-style browser implemented as a web
> application);
> d. backing up state and history of the browser itself to a server
> (again, assuming that the browser is a web application).

Serialising is hard to do async, since you fundamentally have to walk the 
data structure, and the actual serialisation at that point is not 
especially more expensive than a copy.


> The natural course of action would be to do the following:
> 1. collect data to a JSON object (possibly a noop);

I'm not sure what you mean by JSON object. JSON is a string format. Do you 
mean a JS object data structure?

> 2. send the object to a worker;
> 3. apply some post-treatment to the object (possibly a noop);
> 4. write/upload the object.
> 
> Having an asynchronous JSON serialization to some Transferable form 
> would considerably the task of implement step 2. without janking if data 
> ends up very heavy.

I don't understand what JSON has to do with sending data to a worker. You 
can just send the actual JS object; MessagePorts and postMessage() support 
"raw" JS objects.


> So far, I have discussed serializing JSON, not deserializing it, but I 
> believe that the symmetric scenarios also hold.

No, they are quite asymetric. Serialising requires stalling the code that 
is interacting with the data structure, to guarantee integrity. Parsing is 
easy to do on a separate worker, because it has no dependencies -- you can 
do it all in isolation.


On Fri, 8 Mar 2013, David Rajchenbach-Teller wrote:
> 
> If I am correct, this means that we need some mechanism to provide 
> efficient serialization of non-Transferable data into something 
> Transferable.

I don't understand what this means. Transferable is about neutering 
objects on one side and creating new versions on the other. It's the 
equivalent of a "move". Your use cases were about making copies, as far as 
I can tell (saving and backing up).

As a general rule, JSON has nothing to do with Transferable objects, as 
far as I can tell.


On Fri, 8 Mar 2013, David Rajchenbach-Teller wrote:
> 
> Intuitively, this sounds like:
> 1. collect data to a JSON;
> 2. serialize JSON (hopefully asynchronously) to a Transferable (or
> several Transferables).

I really don't understand this. Are you asking for a way to move a JS 
object from one thread to another, killing references to it in the first 
thread? What's the use case? (What would this have to do with JSON?)


On Fri, 8 Mar 2013, David Bruant wrote:
>
> Why not collect the data in a Transferable like an ArrayBuffer directly? 
> It skips the additional serialization part. Writing a byte stream 
> directly is a bit hardcore I admit, but an object full of setters can 
> give the impression to create an object while actually filling an 
> ArrayBuffer as a backend. I feel that could work efficiently.

It's not clear to me what the use case is, but if the desire is to move a 
batch of data from one thread to another, then this is certainly one way 
to do it. Another would be to just copy the data in the first place, no 
need to move it -- since you have to pay the cost of reading all the data 
in the first place, why not do it as part of the postMessage(), rather 
than first building a binary data representation that you'll have to 
reparse on the other side?


On Fri, 8 Mar 2013, David Rajchenbach-Teller wrote:
>
> For instance, how would you serialize something as simple as the following?
> 
> {
>   name: "The One",
>   hp: 1000,
>   achievements: ["achiever", "overachiever", "extreme overachiever"]
>    // Length of the list is unpredictable
> }

Why serialise it? If you want to post this across a MessagePort to a 
worker, or back from a worker, why not just post it?

   var a = { ... }; // from above
   port.postMessage(a);


> > What are the data you want to collect? Is it all at once or are you 
> > building the object little by little? For a backup and for FirefoxOS 
> > specifically, could a FileHandle [3] work? It's an async API to write 
> > in a file.
> 
> Thanks for the suggestion. I am effectively working on refactoring 
> storing browser session data. Not for FirefoxOS, but for Firefox 
> Desktop, which gives me more architectural constraints but frees my hand 
> to extend the platform with additional non-web libraries.

Assuming by "Firefox Desktop" you mean the browser for desktop OSes called 
Firefox, then, why not just do this in C++? I don't understand why you 
would constrain yourself to using Web APIs in JavaScript to write a browser.


On Sat, 9 Mar 2013, David Bruant wrote:
>
> I've once met someone who told me that JSON was bullshit. Since the guy had
> blown my mind during a presentation, I've decided to give him a chance after
> this sentence :-p He explained that in JSON, a lot of characters are double
> quotes and commas and brackets. Also, you have to name fields.
> He said that if you want to share 2 ints (like longitude and latitude), you
> probably have to send the following down the wire:
>     '{"long":12.986,"lat": -98.047}'
> which is about 30 bytes... for 2 numbers. He suggested that a client and
> server could send only 2 floats (4 bytes each, so 8 bytes total) and have a
> convention as to which number is first and you'd just be done with it.
> 30 bytes isn't fully fair because it could be gzipped, but that takes
> additional processing time in both ends.

Hear hear.


> He talked about a technology he was working on that, based on a message 
> description would output both the client and server code (in different 
> languages if necessary) so that whatever message you send, you just 
> write your business code and play with well-abstracted objects and the 
> generated code takes care of the annoying "send/receive a well- 
> compressed message" part.

This isn't a particularly new idea, FWIW. See, for example, protocol 
buffers, or ASN.1's BER. Those are even self-describing to some extent, 
like JSON; protocols like IP, TCP, and UDP don't even do that, they just 
encode their data in a well-defined order with no delimiters at all.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 6 August 2013 21:58:49 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:09:23 UTC