AW: AW: Efficient representation for Web formats from Peintner, Daniel (ext) on 2016-11-17 (public-exi@w3.org from November 2016)

From: Peintner, Daniel (ext) <daniel.peintner.ext@siemens.com>
Date: Thu, 17 Nov 2016 14:08:57 +0000
To: "Stephen D. Williams" <sdw@lig.net>, "public-exi@w3.org" <public-exi@w3.org>
Message-ID: <D94F68A44EB1954A91DE4AE9659C5A980FF6E8BE@DEFTHW99EH1MSX.ww902.siemens.net>

Hi Stephen,

> Thanks for your note and attention.
> By "we do have a solution for that" are you referring to schema
> compression or something else in addition to that?

What I meant was that we have already a solution for JSON (see [1]). Besides we even plan to improve the efficiency of EXI4JSON by allowing to use a more accurate schema/grammars for the actual data compared to the generic one (see https://www.w3.org/TR/exi-for-json/#schema-exi4json).

This idea is still half-baked and we will likely work on that for the next release. Say tuned!

-- Daniel

[1] https://www.w3.org/TR/exi-for-json/

________________________________
Von: Stephen D. Williams [sdw@lig.net]
Gesendet: Donnerstag, 17. November 2016 01:52
An: Peintner, Daniel (ext) (CT RDA NEC EMB-DE); public-exi@w3.org
Betreff: Re: AW: Efficient representation for Web formats

Hi Daniel,

Thanks for your note and attention.
By "we do have a solution for that" are you referring to schema compression or something else in addition to that?

A lot of things came to mind when I saw that you were considering extending the application of these methods beyond JSON, which we talked about some months ago and you guys have now done. Below, I hinted a little about whether I thought each area might be fruitful, but I haven't thought it through or put together any serious rationale. I can provide some further thoughts now with more when I can get to it. Apologies if I repeat myself.

Deltas: As you may remember, in addition to schema compression, which is more of a static solution, I saw the need for a more dynamic delta compression. I was advocating for a mechanism that accomplished a similar thing, but based on the previous message and/or an initial exchange. This is somewhat like some versions of TCP/IP header compression, a preloaded gzip compression table, and is just like the OT types that express deltas as changes or patches to a tree or graph. I believe I saw that protocol buffers and/or some other widely-used API transports have similar concatenable incremental update capability.

There are various forms and things to optimize for in this area. The most widely and immediately usable would be in thinking about the best representation of changes between a web / mobile app and the server, particularly for highly interactive, incrementally saved / shared data, probably over websockets. Text, fields, images, sliders, images and drawing, current video location, and, soon, virtual environment changes and game play. Second Life was a physics and the whole model running on the server solution, with positional and texture telemetry streamed to clients. That's been replaced by what amounts to a full game environment at each endpoint coordinating state. A general, efficient mechanism to transmit and maintain that state would be very useful.

To start with a simple case, image you have a tree of objects that represent data in an application, Facebook posts perhaps. In each of those objects, you have data and metadata. When something updates, perhaps because of a new publish or because someone is typing or moving or drawing or rotating a 3D model or just changing mouse position, some small subset of data changes in a usually incremental way. How do you transmit the least amount of data to keep models synchronized? How does that change when it is every 10 seconds vs. every 100 ms.? This is everything we've talked about in EXI along with ideas similar to an MPEG prediction / compression / representation viewpoint, but generalized and which may or may not have similar lossiness.

As an example of where lossiness can make sense: When dragging or resizing or scrubbing something, it is fine to use impreciseness like rough position and lower resolution while moving, then computing the full quality when at the final location. Sometimes, a subset many things may change slightly. Can all of those deltas of an overall object / tree / graph be represented more compactly as a combined message? Can this be compressed, perhaps in a lossy way for the interim mode, so that the final results at "key frames" are perfectly in sync? A similar thing to study would be methods used for remote desktop viewing, some of which are strictly bitmap based and others which are more logical with box, text, and area objects.

Graphs: Fundamentally, there is one key requirement: A standard, efficient way to represent a reference from one point in a data tree to another. Pick one, perhaps the Falcor convention, and represent it directly and efficiently with bidirectionality. There are maybe two levels to this: Falcor app graph style, where there are occasional pointers, and RDF-style graphs where much of the data is something like a pointer. The relational representation of an RDF triple/quad store is usually to put values in tables with every record being an int reference to a value or another tuple. I had thought quite a bit and begun designing a solution for RDF-style data, ERI I called it, some time ago, although it would take some work to pull that together and make it sharable.

WebAssembly: Determine if EXI methods are a win for JS / AST representations. I'm skeptical this will be competitive, but a quick test or benchmark would help people working on it to have a baseline to compete with.

3D models, animation, etc.: glTF seems well thought out, but somewhat stalled as it isn't widely usable yet. However, solving this is important and about to be widely needed with AR/VR growing rapidly. One interesting difficulty is the representation of animations is likely to evolve. Traditional animations were canned sequences of bone/IK or vertex movement. I think this will change soon to be more target trajectory + semi-physics based which will change the data significantly for the better. I would look more closely at glTF and related needs & solutions to see if an EXI-like solution would improve things in some areas.

Thanks,
Stephen

On 11/16/16 1:57 AM, Peintner, Daniel (ext) wrote:

Hi Stephen,

Thanks for your feedback and insights.

I think at first we might want to consider the widely used formats. Next we may want to look at others.

Just by scanning "Delta" it seems it is JSON. We do have a solution for that that already but I can see the rational for improvements.

With regards to the other I have limited knowledge and we might want to take a look one by one.

What do you think?

Thanks,

-- Daniel

________________________________
Von: Stephen D. Williams [sdw@lig.net<mailto:sdw@lig.net>]
Gesendet: Mittwoch, 2. November 2016 14:24
An: public-exi@w3.org<mailto:public-exi@w3.org>
Betreff: Re: Efficient representation for Web formats

Since we're considering wider application in the realm of web technologies, I'll relate a number areas that come to mind. I've investigated these deeply for the last couple years and have been implementing apps and services using many of these.

I've recently noted a number of delta formats in use with Javascript + browsers. Compressing these would be helpful, especially rapid-fire sequences of incremental updates. One common need is to almost continually save/transmit/real-time receive while someone is typing or doing something, which is somewhat like the telnet / IM/chat / MMO problems. This is a good example of a nice design, implementation, and related theory:

https://quilljs.com/docs/delta/
https://en.wikipedia.org/wiki/Operational_transformation

This may seem largely tangentially related, although direct support for graphs of objects, and updates / deltas for graphs of objects is an important area. I feel that thinking in graph API terms, and understanding the desirable app features (caching, only transmitting what is needed, infinite scrolling with just in time retrieval, scalable data / communications / users / UI) directly affects thinking about data formats and API / communication patterns.

For general data, graphs of JSON objects are very popular. Pulling from older relational and REST ideas, avoiding a lot of dead ends, and learning from RDF/semantic web and other graph database and NoSQL systems and methods, there are two interesting conclusions:

1. Data is always a graph; better to explicitly model that in a clean and resilient way. (What is the main thing that is completely absent in a Relational Database? The relations! Those are buried in code that does joins and similar. One of the more ironic computer science instances.)

2. When you consider app logic required to build an efficient, low-latency, maintainable, and scalable app, a number of features and characteristics should be designed for directly in the data model, API, communication pattern, and application models on both front and back end. Microservices and containerization reinforces that. One important type of solution is graph APIs. The leading projects there are Facebook's GraphQL and Netflix's Falcor. The latter is easier to understand and seems easier to use, but there are some gaps and difficulties. GraphQL has now gained a lot more momentum and broad support. Somewhat related are the real-time, pub/sub-like standing queries such as Firebase, Horizon/RethinkDB, and being added by others (OrientDB etc.). However, it seems easier to get the graph API idea by reading Falcor's description:

https://netflix.github.io/falcor/documentation/model.html

Completely different, yet sharing the same 'everything is one big graph' idea (and being very useful for it):
https://workflowy.com/

WebSockets can be assumed to be the communication link for a large range of latency sensitive applications. Not much to do there as it was tough to overlay bidirectional async communications on the existing protocols, gateways, and other legacy components. It's all deployed and baked in now. Unfortunate that it couldn't be better, but it's not bad for most things. It is worth keeping in mind as an efficient bidirectional communication link, which could make certain strategies seem more broadly feasible. For instance, efficient interleaved async packets can allow for an extra negotiation round trip without much impact for some cases.

There are efforts toward solutions for at least the Emscripten / Asm.js subset of Javascript with WebAssembly. There is a growing need to minimize parse and related startup times for Javascript while still not making many assumptions about implementation. And without any hindrance to new language features; several important additions to core syntax were made not long ago.

https://brendaneich.com/2015/06/from-asm-js-to-webassembly/
https://www.w3.org/community/webassembly/

Handling possibly binary and structured data well is also needed. This ought to be harmonized with supercomputing and emerging ML data formats. While ML training won't be done often in web browsers, running the resulting models, which generally take far less computation and memory, is already somewhat widespread.

http://wiki.ecmascript.org/doku.php?id=harmony:typed_objects

A key subset of non-{image/video/code} binary data is 3D. This is becoming much more important. It is a crucial interest for my company and projects right now in support of simulation and game engine content. Some work has been done, although it isn't necessarily optimal and it isn't implemented completely enough to be very / widely usable yet.

This is the leading and maybe principal effort right now, although we couldn't use it without a lot of completion work. We have to use FBX, DAE (COLLADA), and OBJ. Everything else is too slow, missing features, or not popular.:

https://github.com/KhronosGroup/glTF

Example usage:
http://cesiumjs.org/2015/08/10/introducing-3d-tiles/

A random sample of older and/or proprietary attempts to solve this:
http://openctm.sourceforge.net/
http://www.finalmesh.com/index.htm

ThreeJS is the leading base library for 3D on WebGL.

Stephen

On 11/2/16 2:16 AM, Peintner, Daniel (ext) wrote:

Hi,

As Taki recently discussed in [1] the focus of the EXI working group has changed a bit.

Besides the work on XML, the working group is also exploring the idea of applying EXI to other Web formats. The WG is making JSON exchange more efficient by applying EXI to JSON [2]. We are recently also exploring the idea of applying EXI to CSS [3] and JavaScript [4].

EXI is a general format that sends an efficient stream of events and can have noticeable, measurable savings in CPU, memory & bandwidth over formats such as minify and/or gzip that require the receiver to reconstitute the original JSON/CSS/JavaScript/... and parse it again.

We encourage you to take a look at our exploration of EXI for CSS [3] and EXI for JavaScript [4] and to provide feedback/comments.

Thank you,

Daniel (for the EXI WG)

[1] https://lists.w3.org/Archives/Public/public-exi/2016Oct/0004.html
[2] EXI for JSON, https://www.w3.org/TR/exi-for-json/
[3] EXI for CSS, https://github.com/EXIficient/exificient-for-css
[4] EXI for JavaScript, https://github.com/EXIficient/exificient-for-javascript/

--
Stephen D. Williams sdw@lig.net<mailto:sdw@lig.net> stephendwilliams@gmail.com<mailto:stephendwilliams@gmail.com> LinkedIn: http://sdw.st/in
V:650-450-UNIX (8649) V:866.SDW.UNIX V:703.371.9362 F:703.995.0407
AIM:sdw<thismessage:/> Skype:StephenDWilliams<thismessage:/> Yahoo:sdwlignet<thismessage:/> Resume: http://sdw.st/gres
Personal: http://sdw.st<http://sdw.st/> facebook.com/sdwlig twitter.com/scienteer

Received on Thursday, 17 November 2016 14:09:32 UTC