algorithmic normative conformance requirements, design principles, etc. from Larry Masinter on 2009-05-25 (public-html@w3.org from May 2009)

From: Larry Masinter <masinter@adobe.com>
Date: Mon, 25 May 2009 11:06:08 -0700
To: HTML WG <public-html@w3.org>
Message-ID: <8B62A039C620904E92F1233570534C9B0118CD8A48BF@nambx04.corp.adobe.com>
Sorry this is very long, but I'm trying to be precise. 

We've been talking "design principles" and "editorial
style", so I thought I would deep dive on one little bit to
see if we can get some clarity.

Quick summary:

The problems with the style of the document are 
not merely editorial, it's a technical difficulty
with many of the conformance requirements in the
document.

Missing design principle: avoid MUST conformance
requirements unless it is necessary for interoperability.

Missing editorial policy: all technical terms must
be defined precisely.

Missing design principle: the temporal relationship
between states must either be defined precisely
or made explicitly flexible.


================================================
Long version:

The note is long, and I'm at risk of getting responses which
take some little point down in the middle of an argument and
deep end on critiquing it. I suppose that's fair -- that's
what I'm doing to the HTML spec -- but I'd like to ask that
if you respond, please do respond to the main point, even if
you also want to pick on the details.

I'm sorry that it's long, but I'm trying to work out what
the 'problem' really is, why some people might complain
about the editorial style whlie others see no problem.

We've been talking about what many people take to be an
editorial style issue -- that describing HTML in algorithmic
terms ("when computing X, the following steps MUST be
taken") rather than terms of semantics, with the algorithm
(either normative or not) given subsequently -- that this is
just a choice in how things are written down, and that the
results are equivalent, or that authoring requirements can
be inferred or split out with a style sheet.

However, I think that the editorial style masks a much more
substantive difficulty that is persistent throughout the
specification. This substantive problem really sits at the
heart of web extensibility -- the ability for new kinds of
HTML implementations to evolve realistically. Because
conformance requirements are given in terms of algorithms,
otherwise reasonable implementations of the semantics are
disallowed, even when they would be compliant with
conformance requirements stated in semantic rather than
algorithmic terms.

I'll pick on a single issue, image width and height.

You might claim that this is a single issue and "please fill
out a bug report in the bugzilla database", but I claim that
this issue is endemic, that the instances of it are
numerous, and it results from choosing an editorial style of
specifying conformance by giving algorithms which MUST be
followed.  In fact, it's hard to find a part of the
specification which doesn't have this problem somewhere.

My question was whether image width and image height MUST
simultaneously be known -- whether a script can assume, that
if image.width is non-zero that the height also must be
non-zero, or whether height could be zero at one point in
time and later be non-zero, asynchronously.

There is an algorithmic definition of how to compute the DOM
attributes width and height, as part of a conformance
requirement.  Although the conformance requirement is
written in prose, it's really a little program, so I will
indent it:

"The DOM attributes width and height 
  MUST return 
   the rendered width and height of the image, in CSS pixels, 
   if the image is being rendered,
            and is being rendered to a visual medium; 

or else
   the intrinsic width and height of the image, in CSS pixels, 
  if the image is available 
                  but not being rendered to a visual medium; 

or else 0, 
   if the image is not available or its dimensions
   are not known."

The word "must" is used, so I presume that this is a
normative requirement. I interpret -- is there any other
interpretation of this? -- that any implementation must have
a binary state for an image, 'available?' and an image at
any point in time is either 'available' or 'not available'.
There is no provision in this algorithm for an image which
is partly available and partly not available.

However, the definition of "available" does seem to allow
for partial download:

  If the image was successfully obtained, with no network
  errors, and the image's type is a supported image type,
  and the image is a valid image of that type, then the
  image is said to be available. If this is true before the
  image is completely downloaded, each task that is queued
  by the networking task source while the image is being
  fetched must update the presentation of the image
  appropriately.

This is the closest I can find to the definition of
"available" -- is it defined more formally elsewhere?

So I *think* that a conforming implementation MUST use, for
deciding whether or not "width" and "height" are known and
thus non-zero, this single state. I think it is not allowed
-- by these conformance requirements -- to have "width"
known and "height" not known.

This seems to be an unfortunate tradeoff of precision --
defining behavior precisely -- over extensibility and what
is usually an important design goal of most standards: only
require MUST, only make conformance requirements, for those
requirements that are needed for interoperability, for
things to work together correctly. Unnecessary conformance
requirements restrict the scope of implementations, and need
strong justification -- the fact that writing something less
restrictive in the chosen editorial style should NOT be a
design principle.

Perhaps none of the current implementations of WebKit,
Mozilla, Opera and IE implement anything other than this
algorithm, though? Maybe they all have implemented "width"
and "height" according to the algorithm given. But is this
the *right* requirement? Does the current specification
development method, in which requirements are vetted by
checking them against three or four implementations, bring
sufficient review of whether the normative requirements are
TOO stringent?


I think writers of Javascript programs should NOT assume
that if image.height is non-zero, that image.width is
subsequently non-zero, and the spec should NOT require
simultaneous availability of these values.  Doing so is
unnecessary, and precludes otherwise legitimate
implementations.

There is another design principle, widely used, which seems
to have been not followed: standards should only REQUIRE
(only MUST) behavior with sufficient precision to support
reasonable interoperability. This SHOULD have been a design
principle, but it was not, and Design Principle used instead
is inappropriate.

The current HTML 5 document goes MUCH TOO FAR to "nail down"
behavior (including "error" behavior) to the point where it
stifles any future innovation in implementation methods --
reify a processing model which is inappropriate for many
otherwise legitimate contexts.

This difficulty is a direct result of the attempt to specify
the conformance requirements of a "Language" in algorithmic
terms.

Alas, while "precise" has been a goal, the definition of
terms is, in fact, not "precise" in many instances.  To take
the case here, I asked about the definition of "available",
and asked

> If I know the image width and height, but not whether it 
> is of a supported type, is it "available"?

and the reply was:

> How can you know the dimensions if you don't support it?

If the term "supported" is defined somewhere, I can't find
the definition (the document's liberal use of terms without
formal definition in formal conformance requirements is
itself an problem), but the usage seems to indicate that the
decision of whether a particular image is "supported"
depends only on the image "type", and the examples given of
"supported types" doesn't seem to allow for some image types
(image/png, image/tiff, image/jpeg) to have profiles or
options. That is, an image has a "type", the type is either
"known" or "not known", and if the image type is known, then
it is known whether the image is "supported".

So, an image might be labeled image/jpeg, but the JPEG
committee might release a new version of JPEG and some
servers might have images labeled "image/jpeg" which are the
new version, and not supported (sic) by older browsers.  Is
an image "supported" type, when it cannot be rendered by an
older browser? The width and height are known, but the rest
of the image hasn't been determined?

Does HTML disallow "supporting" TIFF, where images may have
arbitrary compression methods? Is an implementation which
"supports" TIFF but also allows for dynamic downloading
required to make "width" and "height" simultaneously
available?

If I have an implementation which attempts to dynamically
download new decompression algorithms when rendering JPEGs,
but I haven't tried to run the download, is the image
type "supported" or not?

In general, by specifying things in terms of algorithms for
implementation, the functional specification tends to
require a temporal relationship between factors that are not
temporally aligned. So in one place "available" and
"supported" are assumed to be static, binary values, but at
least in the sense of "available" there is some allusion to
the temporal nature of the process, that downloads can be
partial and additional information available asynchronously.

I've beaten this example down, and I haven't justified my
claims that this kind of problem is endemic. More to follow
(alas).

Regards,

Larry
--
http://larry.masinter.net
Received on Monday, 25 May 2009 18:06:55 UTC