RE: Changing the name and focus of this group from WebAssembly and binary code to WebCore etc and source code. from Jeff Lewis on 2015-11-27 (public-webassembly@w3.org from November 2015)

From: Jeff Lewis <jlewis@vargr.com>
Date: Thu, 26 Nov 2015 23:41:10 -0800
To: "'JS Stats'" <info@jsstats.com>, <public-webassembly@w3.org>
Message-ID: <001e01d128e6$fe7e8a70$fb7b9f50$@vargr.com>
" The difference seems to just be what intent people want:

- a binary virtual machine code, with a design and language that parallels compilation to native machine code, and might invoke the parallels of 'disassembly' and 'reverse engineering' which might disadvantage web user rights but advance some agendas.

- versus a clear source code story for the web, which is my strong preference."

Why are these mutually independent concepts? The idea, as I understood it, was to create the foundations for a bytecode pseudomachine based system that in the end would deliver a Java or .Net like foundation for a future web that got us out of the 'script' mindset and into more modern and sophisticated development (like - isolated libraries that couldn't stomp on each other).

If you take a look at how .Net's CLR works - it really accomplishes both of these goals admirably. You can compile dynamically - which gives you the immediacy of a scripted language - in fact, you can compile IN your code for truly dynamic real time coding, and yet you can reverse compile IL (the bytecode .Net uses) back into source at any time. If you include the symbols, you can debug and even reconstruct variable names.

It's relatively language agnostic. There are .Net versions of Python, Perl, PHP and so on as well as more 'classic' languages like C# and C++ - and they all interop with each other fairly cleanly. Even better still, Microsoft has made the core of .Net an ECMA standard and open sourced.

Why reinvent the wheel?

The idea of continuing forward with a web infrastructure that still squirts tons of source code directly to a browser to be interpreted AS code seems pointless. It's really just patching the problems with the existing system - sort of. Because as we're seeing with this project - you'll end up still having to support JavaScript pretty much forever anyway, so you won't really ever be able to get much past that. With a ..Net-like solution - JavaScript just becomes one of the supported languages.

I don't actually care if it's .Net - any similar technology like Java's Bytecode - or even a wholly new such implementation would be a huge improvement... it's just that .Net is there for the taking and now is completely crossplatform (or can be made so relatively easily) and is opensource.

Regards,
Jeff Lewis


-----Original Message-----
From: JS Stats [mailto:info@jsstats.com] 
Sent: November 26, 2015 7:48 PM
To: public-webassembly@w3.org
Subject: Re: Changing the name and focus of this group from WebAssembly and binary code to WebCore etc and source code.


The name WebCore seems well taken. Another suggestion: WebBitScript. A quick US trademark search found nothing for bit-script, nor web-bit-script, but web-script was popular.

Expanding on the source compression efficiency, I believe it could be competitive with whatever wasm could achieve. The key would be to have a canonical text source style that compresses most efficiently, and any deviation might increase compressed size to remain lossless. If the producer wanted maximum compression they would firstly canonicalize the style of their source text, and this could just be a compression option to ignore non-canonical styling and text.

There could be a canonical white space source convention. The canonical text style need not be stripped of white space, rather it could use standard indentation etc.

The same principle might be used for some other source elements such as labels that might have a canonical minified format while the compression could still degrading gracefully if some labels were not in canonical form. For example the producer might choose to not canonicalize function names but to canonicalize block labels and local variables. For example the producer might choose to keep some common local variable names, such as a stack pointer etc, which might compress very well given their repetition.

The canonical form might include specific support for comments, so that the consumer knowns the difference between comments and non-canonical styling.

Multiple text source formats (JS-like, minified JS-like, s-exp, etc) might be supported, and if in canonical style then they could be converted to another canonical format without loss. Source with non-canonical text would obviously not convert without loss but could still convert to a canonical style.

Anyway, it seems quite possible to focus on a primitive 'source' code without compromising the goals of being a 'portable, size- and load-time-efficient format suitable for compilation to the web'.

The difference seems to just be what intent people want:

- a binary virtual machine code, with a design and language that parallels compilation to native machine code, and might invoke the parallels of 'disassembly' and 'reverse engineering' which might disadvantage web user rights but advance some agendas.

- versus a clear source code story for the web, which is my strong preference.

Regards
Douglas Crosher

On 11/26/2015 06:25 PM, JS Stats wrote:
> I would like to make the case to the members to consider changing the 
> focus of this group from the development of a binary code to a source 
> code with a binary encoding. The difference might not sound 
> significant at first but it might make a significant difference to the 
> intent of code deployed to the web in binary format.
> 
> In the current case source code is 'compiled' or 'assembled' into the 
> binary format and deployed in binary format. With this focus the 
> developers might be tempted to abandon any claim to the binary 
> encoding being related to the source, and for example move to a linear 
> virtual machine code without expressions or structured flow control etc.
> 
> While it might be possible to 'view-source' the deployed code it might 
> be consider 'disassembly' or 'reverse engineering' which are very 
> loaded terms for IP.
> 
> I believe that although the operators being developed are primitive 
> and close to the hardware, that these can still be used in a 
> structured source code with expressions and local variables etc to 
> make the code more readable and easier to write. A binary encoding 
> would still be developed that would be a one-to-one reversible 
> encoding of the source (basically a lossless compression of the 
> source). I believe this could still be a good target for the use case 
> of a compilation target which seems to be the current focus.
> 
> I have been working away at trying to use type derivation to help 
> eliminate bounds checking, and there has been another recent proposal 
> by sunfish to use some loop analysis to help eliminate bounds checks 
> too, and while I don't have anything concrete I suspect this will be 
> much easier to define in structured code. For example, a common case 
> is to define a local constant variable with a typed that can be 
> derived such as masking a value or asserting its bounds.
> 
> The new name would remove 'Assembly' and make it clear this is a 
> source code although primitive. For example WebCore if it is not 
> taken. The Specification language would change it's emphasis to being 
> a source code, while still supporting the use case of being a compilation target.
> 
> Would there be any support for such a re-focusing of the group, or are 
> the majority of people wanting a web machine code binary format to 
> compile to?
> 
> Regards
> Douglas Crosher
> 
>
Received on Sunday, 29 November 2015 21:53:08 UTC