- From: Larry Masinter <masinter@adobe.com>
- Date: Mon, 25 May 2009 11:06:08 -0700
- To: HTML WG <public-html@w3.org>
Sorry this is very long, but I'm trying to be precise. We've been talking "design principles" and "editorial style", so I thought I would deep dive on one little bit to see if we can get some clarity. Quick summary: The problems with the style of the document are not merely editorial, it's a technical difficulty with many of the conformance requirements in the document. Missing design principle: avoid MUST conformance requirements unless it is necessary for interoperability. Missing editorial policy: all technical terms must be defined precisely. Missing design principle: the temporal relationship between states must either be defined precisely or made explicitly flexible. ================================================ Long version: The note is long, and I'm at risk of getting responses which take some little point down in the middle of an argument and deep end on critiquing it. I suppose that's fair -- that's what I'm doing to the HTML spec -- but I'd like to ask that if you respond, please do respond to the main point, even if you also want to pick on the details. I'm sorry that it's long, but I'm trying to work out what the 'problem' really is, why some people might complain about the editorial style whlie others see no problem. We've been talking about what many people take to be an editorial style issue -- that describing HTML in algorithmic terms ("when computing X, the following steps MUST be taken") rather than terms of semantics, with the algorithm (either normative or not) given subsequently -- that this is just a choice in how things are written down, and that the results are equivalent, or that authoring requirements can be inferred or split out with a style sheet. However, I think that the editorial style masks a much more substantive difficulty that is persistent throughout the specification. This substantive problem really sits at the heart of web extensibility -- the ability for new kinds of HTML implementations to evolve realistically. Because conformance requirements are given in terms of algorithms, otherwise reasonable implementations of the semantics are disallowed, even when they would be compliant with conformance requirements stated in semantic rather than algorithmic terms. I'll pick on a single issue, image width and height. You might claim that this is a single issue and "please fill out a bug report in the bugzilla database", but I claim that this issue is endemic, that the instances of it are numerous, and it results from choosing an editorial style of specifying conformance by giving algorithms which MUST be followed. In fact, it's hard to find a part of the specification which doesn't have this problem somewhere. My question was whether image width and image height MUST simultaneously be known -- whether a script can assume, that if image.width is non-zero that the height also must be non-zero, or whether height could be zero at one point in time and later be non-zero, asynchronously. There is an algorithmic definition of how to compute the DOM attributes width and height, as part of a conformance requirement. Although the conformance requirement is written in prose, it's really a little program, so I will indent it: "The DOM attributes width and height MUST return the rendered width and height of the image, in CSS pixels, if the image is being rendered, and is being rendered to a visual medium; or else the intrinsic width and height of the image, in CSS pixels, if the image is available but not being rendered to a visual medium; or else 0, if the image is not available or its dimensions are not known." The word "must" is used, so I presume that this is a normative requirement. I interpret -- is there any other interpretation of this? -- that any implementation must have a binary state for an image, 'available?' and an image at any point in time is either 'available' or 'not available'. There is no provision in this algorithm for an image which is partly available and partly not available. However, the definition of "available" does seem to allow for partial download: If the image was successfully obtained, with no network errors, and the image's type is a supported image type, and the image is a valid image of that type, then the image is said to be available. If this is true before the image is completely downloaded, each task that is queued by the networking task source while the image is being fetched must update the presentation of the image appropriately. This is the closest I can find to the definition of "available" -- is it defined more formally elsewhere? So I *think* that a conforming implementation MUST use, for deciding whether or not "width" and "height" are known and thus non-zero, this single state. I think it is not allowed -- by these conformance requirements -- to have "width" known and "height" not known. This seems to be an unfortunate tradeoff of precision -- defining behavior precisely -- over extensibility and what is usually an important design goal of most standards: only require MUST, only make conformance requirements, for those requirements that are needed for interoperability, for things to work together correctly. Unnecessary conformance requirements restrict the scope of implementations, and need strong justification -- the fact that writing something less restrictive in the chosen editorial style should NOT be a design principle. Perhaps none of the current implementations of WebKit, Mozilla, Opera and IE implement anything other than this algorithm, though? Maybe they all have implemented "width" and "height" according to the algorithm given. But is this the *right* requirement? Does the current specification development method, in which requirements are vetted by checking them against three or four implementations, bring sufficient review of whether the normative requirements are TOO stringent? I think writers of Javascript programs should NOT assume that if image.height is non-zero, that image.width is subsequently non-zero, and the spec should NOT require simultaneous availability of these values. Doing so is unnecessary, and precludes otherwise legitimate implementations. There is another design principle, widely used, which seems to have been not followed: standards should only REQUIRE (only MUST) behavior with sufficient precision to support reasonable interoperability. This SHOULD have been a design principle, but it was not, and Design Principle used instead is inappropriate. The current HTML 5 document goes MUCH TOO FAR to "nail down" behavior (including "error" behavior) to the point where it stifles any future innovation in implementation methods -- reify a processing model which is inappropriate for many otherwise legitimate contexts. This difficulty is a direct result of the attempt to specify the conformance requirements of a "Language" in algorithmic terms. Alas, while "precise" has been a goal, the definition of terms is, in fact, not "precise" in many instances. To take the case here, I asked about the definition of "available", and asked > If I know the image width and height, but not whether it > is of a supported type, is it "available"? and the reply was: > How can you know the dimensions if you don't support it? If the term "supported" is defined somewhere, I can't find the definition (the document's liberal use of terms without formal definition in formal conformance requirements is itself an problem), but the usage seems to indicate that the decision of whether a particular image is "supported" depends only on the image "type", and the examples given of "supported types" doesn't seem to allow for some image types (image/png, image/tiff, image/jpeg) to have profiles or options. That is, an image has a "type", the type is either "known" or "not known", and if the image type is known, then it is known whether the image is "supported". So, an image might be labeled image/jpeg, but the JPEG committee might release a new version of JPEG and some servers might have images labeled "image/jpeg" which are the new version, and not supported (sic) by older browsers. Is an image "supported" type, when it cannot be rendered by an older browser? The width and height are known, but the rest of the image hasn't been determined? Does HTML disallow "supporting" TIFF, where images may have arbitrary compression methods? Is an implementation which "supports" TIFF but also allows for dynamic downloading required to make "width" and "height" simultaneously available? If I have an implementation which attempts to dynamically download new decompression algorithms when rendering JPEGs, but I haven't tried to run the download, is the image type "supported" or not? In general, by specifying things in terms of algorithms for implementation, the functional specification tends to require a temporal relationship between factors that are not temporally aligned. So in one place "available" and "supported" are assumed to be static, binary values, but at least in the sense of "available" there is some allusion to the temporal nature of the process, that downloads can be partial and additional information available asynchronously. I've beaten this example down, and I haven't justified my claims that this kind of problem is endemic. More to follow (alas). Regards, Larry -- http://larry.masinter.net
Received on Monday, 25 May 2009 18:06:55 UTC