RE: algorithmic normative conformance requirements, design principles, etc. from Ian Hickson on 2009-05-28 (public-html@w3.org from May 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Thu, 28 May 2009 08:39:46 +0000 (UTC)
To: Larry Masinter <masinter@adobe.com>
Cc: HTML WG <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0905280810110.10857@hixie.dreamhostps.com>
On Wed, 27 May 2009, Larry Masinter wrote:
>
> I see that the specification has been updated to clarify the definition 
> of "available", and also that an image can be "available" at one point 
> in time, and stop being "available" later [...]

Yes, I tried to fix the issues you raised in your earlier e-mail on this 
thread, as discussed here:

   http://lists.w3.org/Archives/Public/public-html/2009May/0367.html

(Incidentally there are some questions in that e-mail asking for 
elaborations on particular issues you reference which I would very much 
appreciate a reply to if that is possible; I'd like to fix the issues you 
mention but I wasn't quite sure what the problems were. Any help would be 
greatly appreciated.)


> So I think it means that whether an image is available or not can 
> asynchronously change even during the execution of a program using the 
> API.

No, it can only change in response to the processing resulting from the 
tasks queued by the "fetching" algorithm, which are only processed as part 
of the execution of the event queue, which is also the only mechanism by 
which script can execute. There is therefore a guarantee that the values 
of these attributes will not change during script execution.

For more details on this:

   The fetching algorithm
   http://www.whatwg.org/specs/web-apps/current-work/#fetch
   (Step 5 is the one that fires the incremental steps.)

   The invokation of the fetching algorithm for <img> elements
   http://www.whatwg.org/specs/web-apps/current-work/#fetch
   (Specifically, the paragraphs starting "Unless the user agent cannot 
   support images" is the one that calls "fetch".)

   Event loops
   http://www.whatwg.org/specs/web-apps/current-work/#event-loop


> Is there some transactional interlock which does not allow an image to 
> become unavailable between the time that image.width is requested and 
> image.height is requested?

Yes, the event loop mechanism ensures that nothing from the network can 
have any effect while scripts execute (with the exception of synchronous 
XMLHttpRequest).


> If I write a program that reads image.width and then, discovering it's 
> non zero, does a lengthy calculation, after which assumes image.height 
> was non-zero... couldn't that change because of a network error?

No.


> Can an image go back and forth between being "available" and "not 
> available", say, on a limited-memory browser that does not want to cache 
> or keep images? (I'm imagining a 10,000 page Flikr on a 
> low-memory-footprint browser).

Actually no, the image can only be fetched once, and is required (within 
the limitations of the device) to be kept after that. (There is an 
allowance for not fetching the image immediately, though. In particular, 
browsers tend to expose this as a preference to disable images.)

Having said that, the spec does have a clause that allows UAs to behave in 
manners that would otherwise be considered incorrect when forced to do so 
to handle hardware limitations, so it is possible that certain UAs would 
in fact act as you describe. It is likely that such UAs would find a 
reduced level of interoperability with Web content.


> In asking about this:
> 
> > It is necessary for the DOM to behave predictably and identically in 
> > all user agents if we are to have a coherent application development 
> > platform.
> 
> But the DOM only behaves "predictably" insofar as the rest of the 
> network behaves predictably, and APIs that depend to some degree on 
> external state -- like whether network resources are or are not 
> available at any particular instant, cannot be "nailed down" precisely 
> without severe implementation cost, caching requirements, disallowing 
> pipelining and so forth. If there are two different images from 
> different sources, you can't predict which one is "available" before the 
> other.

It is true that there are practical limits on the reliability of the 
platform.


> Some users may have made false assumptions about believing that, once 
> having retrieved image.width and found it non-zero, they could retrieve 
> image.height and found *it* non-zero -- that is no good reason for 
> requiring something that, in practice, cannot be guaranteed without 
> unreasonable cost.

Assuming they obtain both within the same script execution, their 
assumption is sound. It is true that across multiple script executions 
that this can break down. In practice authors do get caught in these 
problems, but the fewer we can make them, the better.


> For example, suppose I have a script which is trying to decide some 
> layout of multiple images by investigating their width and height, while 
> the images are still being loaded. The order of completion of retrieval 
> might actually be predictable in most situations (oh, http://localhost/ 
> URIs are available immediately, that no image goes from being 
> "available" to "Not available"), and users will write code that 
> "depends" on this.
>
> Now, it would be possible to guarantee this behavior by mandating some 
> ordering, or to nail down consistent DOM behavior by not executing any 
> script until all of the images are completely loaded, or running the 
> script as if no image is "available" except in a predefined order, etc. 
> But doing so would impose unreasonable implementation requirements on 
> browsers that are also simultaneously trying to optimize performance and 
> responsiveness.
> 
> So I think there's some tradeoff here. In the case of asynchronous image 
> loading, the specification chooses performance over consistency. In the 
> case of image.length vs. image.width, the specification chooses DOM 
> consistency over (perhaps) performance.

Yes, there is a judgement call here. Generally speaking, and maybe _this_ 
should be a design principle, the Web platform has leaned towards making 
intrinsicly unpredictable -- and frequently varying -- features act in an 
unpredictable manner, and everything else, where behaviour can be defined 
and forced, act in a predictable and consistent manner.


> A simple minded implementation is one in which image.width and 
> image.height are independent, not guaranteed to be simultaneously 
> available, and are obtained, each time, by interrogating an independent 
> image retrieval agent. Without a cache, every computation of image.width 
> and image.height winds up invoking the retrieval mechanism, and, because 
> availability can vary, may result in those to values being 
> inconsistently available.

Such an implementation would be non-conforming.


> Writing the specification in terms of constraints on results (even if 
> those constraints involve calculations) would make this kind of 
> dependency clearer. Rather than "how to compute width and height" giving 
> an algorithm for it, a set of constraints.

I agree. Generally I prefer to write things out in terms of constraints. 
Where I have failed to do so, I fall back on the algorithmic approach.

If you known how to describe the constraints on the image fetching 
mechanism in terms of constraints in a manner that isn't more confusing 
than what we have now, please tell me how, so that I can do it. So far I 
have found it incredibly hard to describe these constraints at all, let 
alone in a manner than is easier to understand than what the spec has now.


> There's a robustness principle involved here as well. Authors of 
> javascript programs SHOULD be given a language definition that is 
> conservative -- they SHOULD write programs that assume the fewest number 
> of invariants.

Unfortunately, whether they should or not, they don't.


> User Agents interpreting those programs SHOULD be liberal: they should 
> try to assure that conditions that aren't actually guaranteed or 
> possible to guarantee are still likely to be true, if users commonly 
> (and mistakenly) rely on those conditions.

I agree, in fact I would, and do, go further, and say that they MUST be 
liberal, and what's more, they MUST all be liberal in exactly the same 
way, with that way being described in painful detail in the spec.


> An algorithmic computation may be precise, but it is often *too* precise 
> for a useful authoring guideline.

I don't believe any of the authoring guidelines are algorithmic, only the 
UA conformance criteria are described that way as far as I know.

If any of the UA conformance criteria are too precise, please let me know.


> For an authoring specification, it is better to give conditions which 
> MUST hold and other conditions which SHOULD hold, with the user agent 
> interpreting the HTML responsible for attempting to assure the SHOULDs, 
> while authors know that they cannot rely on them.

Maybe. I don't know that I really agree. In practice, though, HTML5 is 
both an authoring specification AND an implementation specification (for a 
broad range of conformance classes), and so we have both the descriptions 
of the conditions that authors are required to obey _and_ the requirements 
for implementations that describes how to handle all cases even when the 
requirements are not met.

In certain cases, the two are split apart into separate sections, for 
example section 9.1 Writing HTML documents vs section 9.2 Parsing HTML 
documents, or for example the DOM API descriptions which have green boxes 
for the APIs that don't mention what happens with bogus input, followed by 
separate text that describes the actual requirements for those APIs. In 
other cases, they are much more closely mixed together, for example the 
descriptions of most elements and attributes.


Anyway, if you have any concrete suggestions of what can be done to 
improve particular sections of the spec, please do let me know. Your 
review of the specification is very appreciated.

Cheers,
-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 28 May 2009 08:40:23 UTC