Re: IDL: number types from Boris Zbarsky on 2013-03-21 (public-script-coord@w3.org from January to March 2013)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Thu, 21 Mar 2013 00:03:10 -0400
To: Allen Wirfs-Brock <allen@wirfs-brock.com>
CC: Marcos Caceres <w3c@marcosc.com>, Yehuda Katz <wycats@gmail.com>, Anne van Kesteren <annevk@annevk.nl>, public-script-coord@w3.org
Message-ID: <514A867E.4080005@mit.edu>
On 3/20/13 11:02 PM, Allen Wirfs-Brock wrote:
> However whatever technique is used, it doesn't look like idiomatic
> JavaScript code.  If this style of pessimistic validation as part of a
> prolog to every function is a good thing, why don't we see that
> technique widely used in human written JS libraries?

Several reasons:

1)  Human-written JS libraries don't tend to worry about edge-case 
interop.  For example, they feel free to change the order they do 
coercions in, as far as I can tell.

2)  Human-written JS libraries generally don't tend to think too much 
about edge cases, again from what I can tell.  They figure if the caller 
passes in a "weird" object and that causes something to break, that's 
the caller's fault.

These comes back and bites those same human-written JS libraries every 
so often.

But most importantly:

3)  Human-written JS libraries don't have to assume that the caller is 
hostile (because they run with the caller's permissions anyway, so they 
can't do something the caller can't do).  Unfortunately, WebIDL 
implementations most definitely do NOT have this luxury.

> Right, they all amount to the same thing.   Statically generated
> validation is done on every function entry.  This is a loose in a
> dynamically typed language.  Imaging a Uint16 value that gets passed as
> a parameter through 4 levels of function calls before it is actually
> used in some manner.  If a statically-typed language that parameter is
> passed with no runtime checking, because the compiler can guarantee at
> each level that it will have a Uint16 value at runtime. In a dynamic
> language the check at each level cost something

Or gets optimized away by the JIT's range analysis, as the case may be.

But yes, I agree that programming defensively in this way is a 
performance drag in general, and the JIT won't always save you.  All I 
can say is that if I were implementing such a system and had to write 
such defensive code for it, I would make versions of the deeper-in 
callees that do not do validation on arguments and then call them 
internally, while only exposing APIs that perform validation at the 
trust boundary.

> So assume we pass the integer 4096 to the first function. In JS the
> value is passed as a "Number" (a float64).

What it's actually passed in cases where performance matters (i.e. after 
the JIT has kicked in) as depends on what the JIT has inferred about 
that value and how it will be used, for what it's worth.  Maybe it's 
being passed as a float64, maybe an int32.

> In dynamic language
> based application I've analyzed in the past, this sort of redundant
> validation proved to be a real performance bottleneck.

I agree that it can be, for sure.  I wish I had a better answer here...

> Dynamic languages should do optimistic type checking because the type checks all occurs at
> runtime and redundant checks are a waste. Optimistic checking means that
> a value is assumed to be of the right type (or meets other  checkable
> preconditions) up until an actual use of the value that requires a
> guarantee that the value meets the constraints.

This general claim has tons of caveats to it.  For example, detecting 
that your duck is actually an elephant with a duckbill mask while in the 
middle of mutating some data structures involves undoing the mutations 
back to a consistent state... or checking before you start mutating.

Now obviously this is something that can be decided on a case-by-case 
basis if you sit down and take the time to analyze all the cases 
carefully and are competent to do so.  Or you can do checking up front 
and not have to do the complex and failure-prone analysis.

> A function that simply passes a parameter value
> along as an argument to another function usually shouldn't do any
> validity checks on such a value.

Such functions are very rare in WebIDL.  The only case I can think of in 
which a WebIDL method/getter/setter invokes another WebIDL 
method/getter/setter (as opposed to some internal operation that can 
just assume its arguments are sane) is [PutForwards].

So as I see it, in the current WebIDL setup there is trusted 
implementation code and untrusted consumer code and argument 
verification happens at the trust boundary only.  Once behind the trust 
boundary, you just operate on things without doing type checking or 
coercion except as needed, because you control them all and you know 
that they don't do insane things.

> Yes, but people now extensively write mission critical code application
> code and libraries in ES and we don't see these techniques being widely
> used.  What is it about the libraries described using WebIDL that make
> them unique in requiring this sort of auto generated glue code?

See above.  But to expand on that, applications written in ES control 
their own code and sanitize incoming data when it comes in, while 
libraries tend to just punt on "weird" cases.

Put another way, I'm 100% sure that I can pass arguments to jquery that 
will corrupt its internal state in interesting ways.  But the jquery 
authors frankly don't care if I do, because the only consequence is that 
other scripts on that same page won't work right.

> Right, have we done any analysis yet of the systemic cost of those
> auto-generated coercions.

None of the things for which we plan to use JS-implemented WebIDL are 
critical to performance (in the sense of there being lots of calls 
across the boundary).

> Turing complete pseudo-code or prose is needed today to fully specify
> all of these Web APIs.  The WebIDL signature is presumably only a small
> part of the specification of most functions.

Actually, that presumption is somewhat false.  I've been converting a 
lot of things to Gecko's new WebIDL bindings recently, and the WebIDL 
signature is in fact a huge part of the specification of many of them. 
There's tons of stuff in the web platform (especially for elements) that 
just returns or sets a member variable, for example.

> I know I'm getting redundant, but I have to ask again, what is so
> special about most of the web APIs that will be specified using WebIDL.
> If non-WebIDL ES is adequate for complex applications why won't it be
> adequate for web APIs.

See above.

> Do you know of JS libraries that expose the forms of overloads that are
> expressible in WebIDL?  They are awkward to express in pure JS.

jQuery.protototype.init.

jQuery's parseHTML (see the 'context' argument in jQuery 1.9.1).

jQuery's each (see the "arraylike or not" overloading).

jQuery's makeArray (overloads a string and an array, treating a string 
as a single-element array internally).

I'm about 10% into jQuery, and I'm sure I've missed a few above that 
point, and I'm also sure there are tons more further on.

I agree that the resulting code is somewhat awkward, of course.  But if 
that's the API you want to expose....

>> I think the big danger with this approach is that they will occur
>> unintentionally without the algorithm writer bothering to think about
>> them very much.
>
> in contrast to WebIDL, where the algorithm writer probably doesn't
> bother to think about the cost of the coercions that are automatically
> performed.

Yes, but there's a difference, to me, between the severity of a small 
performance hit and an exception thrown while data structures that 
affect the behavior of privileged code are in an inconsistent state...

-Boris
Received on Thursday, 21 March 2013 04:03:42 UTC