Re: Language specification vs. user agent specification from Ian Hickson on 2009-05-25 (public-html@w3.org from May 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 25 May 2009 00:51:47 +0000 (UTC)
To: Larry Masinter <masinter@adobe.com>, Jonas Sicking <jonas@sicking.cc>
Cc: HTML WG <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0905250024560.28739@hixie.dreamhostps.com>
On Sun, 24 May 2009, Larry Masinter wrote:
>
> I think it is traditional to define a language and its semantics first 
> -- what does an utterance in the language *mean* and separately and 
> independently (even if the same document) define conformance 
> requirements for processors of that language.

As Maciej says, this isn't actually the tradition in most high-quality 
specs. However, based on feedback from the TAG and others, it actually 
_is_ what HTML5 does. For example, taking the image.height attribute you 
mentioned, you'll find that immediately above the formal definition of the 
implementation conformance criteria of image.height, there is a green box 
which gives the semantic _meaning_ of image.height, for the sake of 
authors. In fact the spec goes even further and allows readers to 
completely hide the implementation criteria.

So, as far as I can tell, HTML5 does what you want here.


> The current document frequently does not do that, but defines the 
> processing requirements and then infers the language semantics from 
> that.

Please highlight such examples by filing bugs in the group's Bugzilla 
installation as they are bugs in the spec. Please use this form to file 
such bugs:

    http://www.w3.org/Bugs/Public/enter_bug.cgi?assigned_to=dave.null%40w3.org&blocked=&bug_file_loc=http%3A%2F%2F&bug_severity=minor&bug_status=NEW&comment=The%20following%20feature%20in%20the%20following%20section%20has%20only%20implementation%20conformance%20criteria%20and%20does%20not%20have%20a%20semantic%20description%20first.%0D%0A%0D%0AFeature%3A%20%0D%0ASection%3A%20%0D%0A&component=Spec%20bugs&contenttypeentry=&contenttypemethod=autodetect&contenttypeselection=text%2Fplain&data=&dependson=&description=&form_name=enter_bug&keywords=&maketemplate=Remember%20values%20as%20bookmarkable%20template&op_sys=All&priority=P4&product=HTML%20WG&qa_contact=public-html-bugzilla%40w3.org&rep_platform=All&short_desc=Omitted%20language%20semantic%20for%20&target_milestone=---&version=unspecified


> Defining things as language-first is more useful and easier to read than 
> the current editorial style, especially for end-users and authors of 
> HTML documents, whose priority over implementers is a stated design 
> goal.

The HTML5 spec has a style sheet that hides any implementation criteria, 
leaving it as a purely author specification. How is this not satisfactory?



> I took as an example (randomly selected) image.width and image.height 
> defined in terms an algorithm for determining them based on whether the 
> image was "available", and that "available" wasn't well defined, and 
> that I would have to guess whether an image defined with "data:" was or 
> wasn't available.
>
> Ian replied: 
> > Why would you have to guess? It is in fact fully-defined: the scheme is 
> > orthogonal to the issue, it only depends on whether the image, once 
> > obtained, is of a supported type and is a valid image of that type.
> 
> I think the notion of "available" isn't precisely defined

I even quoted the definition. Could you elaborate?


> For example, if the representation of an image URI has only been partly 
> retrieved, such that the width and height are known, but the rest of the 
> image data is not known -- is the image "available"?

The spec explicitly covers this case:

# If the image's type is a supported image type, and the image is a valid 
# image of that type, then the image is said to be available (this affects 
# exactly what the element represents, as defined below). This can be true 
# even before the image is completely downloaded, if the user agent 
# supports incremental rendering of images [...]


> If I know the image width and height, but not whether it is of a 
> supported type, is it "available"?

How can you know the dimensions if you don't support it?


> What if I have a processor for which support for image types are 
> dynamically downloadable, and so whether the image is a "supported type" 
> depends on the state of the download mechanism itself?

Then whether the image is available will change as the support is loaded. 
Why is this a problem?


> Is an implementation which returns the correct width and height
> even if the image itself is not "available" -- is it non-compliant?

Yes.


> What if I know with 95% certainty what the image width and height are, 
> but there is some asynchronous process checking the validity of that? 
> Are image.width and image.height allowed to change?

I don't understand what you mean by "change" in this context. The values 
of DOM attributes are only defined at the time that their "getter" is 
invoked. The values returned vary over time, e.g. if the image is not 
available, or if the image is being rendered, etc.


> If I have some other way of inferring the width and height of an image 
> without accessing the image itself, is the image "available"?

No.


> If, for example, I am building a summarizer which wants to know which 
> images are visible and which ones aren't, I might want JavaScript 
> programs to have access to image.width and image.height even if the 
> image isn't retrieved or even retrievable.

It is a design goal that scripts execute identically in all processors, 
so this wouldn't be desireable. We explicitly don't want different user 
agents to return different values in equivalent situations.


> I think defining image.width and image.height in terms of processing 
> requirements is unnecessary, introduces conformance requirements that 
> MUST be followed that at first look may seem to be precise but in fact 
> are not, and don't actually help end-users or authors.

I disagree on all three counts.


> Perhaps no browsers currently implement any of the extensions or 
> pipelined image retrieval, but authors should not be encouraged to 
> create JavaScript applications that depend on the fact that image.width 
> is known to infer that the image is "available" for any other purpose, 
> which is implied by the current spec.

Authors are already writing such applications, that's why it is so 
important to have identical behaviour in all user agents.


> Even if it were possible to nail down exactly what "available" is or 
> isn't, doing so is a bad idea, unnecessary, and the basis for what I was 
> saying was "short-sighted": trading short-term consistency against 
> long-term extensibility.

I disagree completely.


On Sun, 24 May 2009, Jonas Sicking wrote:
> 
> There are advantages and disadvantages with the current writing style 
> IMHO. The main advantage is that it makes it more likely for 
> implementations to agree with each other and have fewer bugs. This is 
> definitely addressing an important aspect, since implementation bugs and 
> implementation differences is one of the biggest problems facing web 
> authors today. It's also a big reason for why the web is as messy as 
> messy as it is.
> 
> However it mostly helps with these things when the implementation is 
> able to follow the algorithm in the spec. In other cases the 
> implementation has to reverse engineer the semantics from the algorithm 
> and then create an implementation based on those semantics. This 
> introduces the risk of misinterpretations and bugs in both the deducing 
> semantics step, and the implementation step.

The main reason I use algorithms instead of describing the semantics is 
that I can't work out how to describe the semantics in a coherent way. I 
try to describe things in an abstract way whenever possible, but sadly the 
Web platform is so complex now that it is rarely possible to do so with 
any sort of sanity.

If anyone has any concrete suggestions for how to recast particular 
algorithms in abstract terms, please let me know. I'd be happy to do so.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 25 May 2009 00:52:24 UTC