RE: algorithmic normative conformance requirements, design principles, etc. from Larry Masinter on 2009-05-27 (public-html@w3.org from May 2009)

From: Larry Masinter <masinter@adobe.com>
Date: Wed, 27 May 2009 16:10:06 -0700
To: Ian Hickson <ian@hixie.ch>
CC: HTML WG <public-html@w3.org>
Message-ID: <8B62A039C620904E92F1233570534C9B0118CD95E70E@nambx04.corp.adobe.com>
I see that the specification has been updated to clarify the
definition of "available", and also that an image can be "available"
at one point in time, and stop being "available" later:

# If the image is in a supported image type and its dimensions
# are known, then the image is said to be available (this affects
# exactly what the element represents, as defined below). This 
# can be true even before the image is completely downloaded, 
# if the user agent supports incremental rendering of images; 
# in such cases, each task that is queued by the networking task 
# source while the image is being fetched must update the
# presentation of the image appropriately. It can also stop
# being true, e.g. if the user agent finds, after obtaining
# the image's dimensions, that the image data is actually 
# fatally corrupted.

So I think it means that whether an image is available or not
can asynchronously change even during the execution of a 
program using the API. 

Is there some transactional interlock which does not allow
an image to become unavailable between the time that
image.width is requested and image.height is requested?
If I write a program that reads image.width and then,
discovering it's non zero, does a lengthy calculation,
after which assumes image.height was non-zero... couldn't
that change because of a network error?

Can an image go back and forth between being "available" and
"not available", say, on a limited-memory browser that does not
want to cache or keep images? (I'm imagining a 10,000 page
Flikr on a low-memory-footprint browser).

In asking about this:

> It is necessary for the DOM to behave predictably and identically in all 
> user agents if we are to have a coherent application development platform.

But the DOM only behaves "predictably" insofar as the rest
of the network behaves predictably, and APIs that depend
to some degree on external state -- like whether network
resources are or are not available at any particular instant,
cannot be "nailed down" precisely without severe implementation
cost, caching requirements, disallowing pipelining and so forth.
If there are two different images from different sources, you
can't predict which one is "available" before the other.

Some users may have made false assumptions about believing
that, once having retrieved image.width and found it non-zero,
they could retrieve image.height and found *it* non-zero -- that
is no good reason for requiring something that, in practice, cannot
be guaranteed without unreasonable cost.

For example, suppose I have a script which is trying to decide
some layout of multiple images by investigating their width
and height, while the images are still being loaded. The order
of completion of retrieval might actually be predictable in
most situations (oh, http://localhost/ URIs are available
immediately, that no image goes from being "available" to
"Not available"), and users will write code that "depends"
on this. 

Now, it would be possible to guarantee this behavior by
mandating some ordering, or to nail down consistent DOM
behavior by not executing any script until all of the
images are completely loaded, or running the script as if
no image is "available" except in a predefined order, etc.
But doing so would impose unreasonable implementation requirements
on browsers that are also simultaneously trying to optimize
performance and responsiveness.

So I think there's some tradeoff here. In the case of asynchronous
image loading, the specification chooses performance over
consistency. In the case of image.length vs. image.width,
the specification chooses DOM consistency over (perhaps)
performance.

A simple minded implementation is one in which image.width
and image.height are independent, not guaranteed to be
simultaneously available, and are obtained, each time,
by interrogating an independent image retrieval agent.
Without a cache, every computation of image.width and
image.height winds up invoking the retrieval mechanism,
and, because availability can vary, may result in those
to values being inconsistently available.

Writing the specification in terms of constraints on results
(even if those constraints involve calculations) would make
this kind of dependency clearer. Rather than "how to compute
width and height" giving an algorithm for it, a set of 
constraints.

There's a robustness principle involved here as well.
Authors of javascript programs SHOULD be given a language
definition that is conservative -- they SHOULD write programs
that assume the fewest number of invariants.

User Agents interpreting those programs SHOULD be liberal:
they should try to assure that conditions that aren't
actually guaranteed or possible to guarantee are still
likely to be true,  if users commonly (and mistakenly)
rely on those conditions.

An algorithmic computation may be precise, but it is often
*too* precise for a useful authoring guideline. 

For an authoring specification, it is better to give
conditions which MUST hold and other conditions which
SHOULD hold, with the user agent interpreting the HTML
responsible for attempting to assure the SHOULDs, while
authors know that they cannot rely on them.

I think this holds for the "image.width" computation,
but I'll continue to do my "random sample of HTML 5
specification pages" review to turn up some more.

I propose to take the "volume" listed after market
close tomorrow for http://finance.yahoo.com/q?s=GOOG
modulo the page count of 
http://www.whatwg.org/specs/web-apps/current-work/html5-letter.pdf
add one, find the first heading on the page, if any,
and review that section. If there is no heading on the
page, I will scan backward (not using the outline
algorithm) until I find a section heading and then
analyze that section.

If I were doing this today (and I'm not, maybe someone
else wants to?): today the volume is 3,034,475, while
the document has 936 pages, so I would analyze
page 414, and so the section to analyze would
be 4.10.4 The "input" element, and whether the
algorithmic description of what flags must be set
to when various events have fired could better be
described in terms what the values are or are not,
what "flags" are or are not set, instead could
be more readily described and validated by giving
constraints on the values of visible DOM attributes
in terms of the effects of events.

But I'll leave the "input" element as an exercise,
and pick a different random page on Thursday.

Regards,

Larry
-- 
http://larry.masinter.net
Received on Wednesday, 27 May 2009 23:10:47 UTC