Versioning re-visited (was : mixed signals on "Writing HTML documents", tutorial, etc.)

Thread re-named at Dan Connolly's request.  CC list has
"public-html" re-instated, as I could find nothing in
Dan's message to suggest it should be migrated to "www-archive".

Ian Hickson wrote:

> It's not really clear to me that this is an interesting question. I mean, 
> once HTML6 is out, who cares what HTML5 says?

Everyone who has written an HTML 5 document.  I have been
creating web pages ever since HTML 2.0 (many on this list
will have been creating them for far longer), and every
page that I have validated in the past will validate today,
because the specification against which it is written is
enshrined in its DOCTYPE directive.  Yes, it might be helpful
if the validator said (e.g.) "Although your document is
valid HTML 2.0, it is not valid HTML 4.01 which is the current
standard", but that is as far as I want it to go : I most
certainly do /not/ want it to say (in effect) "I have ignored
your DOCTYPE directive and am validating against the most
recent version of the standard" (a) because that is not the
validator's job, and (b) because "the most recent version of
the standard" may well be a very contentious concept, particularly
in view of the divergent evolutionary paths that are only too
apparent at the moment.

> That's one option, possibly not the most useful. There are two more 
> options, one is to simply always validate against the latest version that 
> has that DOCTYPE (this is, IMHO, the most useful option), and the third 
> option is to provide the user with the option to pick which version to 
> validate against (that's e.g., what the CSS validator does).

See above : IMHO, every well-formed (X)HTML document should
contain a pointer to the specification against which it was
written; traditionally this pointer has been the DOCTYPE directive,
and I can see no justification for varying or ditching that now.

> What are the benefits?

The goalposts are fixed, for perpetuity.  A valid document
will remain valid, an invalid document will remain invalid,
and a validator will never give different answers concerning
a particular instantiation of a document.  Furthermore,
a user agent can -- <em>if it so chooses</em> -- make use
of the DOCTYPE in order to vary its parsing, rendering and
so on.  By providing versioning information in the DOCTYPE,
you /allow/ a browser or other user-agent to vary its
behaviour depending on the specification against which
the document was written (but you do not /require/ it); by
removing that information, you prevent browsers/user agents
from adjusting their behaviour to best suit the document
being rendered.
> 
> The drawbacks, the cons, the problems caused by versioning, are:
> 
>  * Authors will check their documents against obsolete versions of the 
>    spec (as, e.g., authors today check their documents against HTML4 
>    instead of HTML4.01), meaning that they do not benefit from the fixes
>    that newer versions of the spec have received, and thus that their 
>    pages are not optimally conformant and accessible.

A document which was valid HTML 4.0 (and therefore, for example,
did not include the "name" attribute on an image) will still
be valid today, and an invalid HTML 4.0 document (which, for
example, /did/ include such an atribute) will still be invalid
today.  That is indisputably correct.  As discussed above,
I have no objections to the validator telling me that an
invalid HTML 4.0 document would be valid HTML 4.01 if I were
to change the DOCTYPE, but that goes beyond formal validation
and into the realms of AI and deep document analysis.  I
certainly wouldn't want the validator to tell me this by
default (because of the parsing overheads), but it /might/
want to tell me that my DOCTYPE is not the most recent, and
I might get different results were I to change it.
> 
>  * An implementor may be tempted to implement a new rendering engine per 
>    version, leaving their old rendering engines with undocumented bugs, 
>    and forcing other implementors to implement a growing number of engines
>    just to compete, eventually leading to the inability for other vendors
>    to compete, and thus to the stagnation of the Web and its failure in 
>    the face of proprietary technologies (as, e.g., is already happening 
>    with IE8, and as has been experienced, in part, with "quirks mode").

There is no "forcing" here.  Yes of course an implementor may be
tempted to write a new renderer for a new spec., leaving the old,
?buggy?, renderer in situ, but no-one forces other implementors
to do likewise.  Leaving aside the marketing strength of Microsoft
(which is /not/, IMHO, this group's concern), users will elect to
use the browser that best suits their needs : and if one implementor
invents a "better" mousetrap (sorry, browser), then the world will surely
build a path to his/her door.
> 
>  * The editors of future versions of the specification will be tempted to
>    use versioning as a means to fix compatibility problems, instead of 
>    picking the harder, but ultimately better, option of addressing 
>    problems in the language itself (as, e.g., people have suggested 
>    several times already for HTML5).

I'm not sure which of those two options "people have suggested" (it's
not clear from your prose) but nor do I follow what you are saying :
can you give an example of "us[ing] versioning as a means to fix
compatibility problems" and compare/contrast this with the idea of
"addressing problems in the language itself" ?

> 
>  * Authors of Web pages merely copy-and-paste the boilerplate text at the 
>    moment, so the shorter and simpler we make it the better; the more we 
>    make the boilerplate change from year to year the more difficulty 
>    authors will have adapting.

As Gregory J. Rosmaita <oedipus@hicom.net> wrote elsewhere :

> 1. poor authoring practices should NOT sway or inform our decisions 

I for one am not the least bit interested in helping to design
a markup language for those too lazy to think : authors who merely
"copy-and-paste the boilerplate text" should /not/ be the target
of our efforts; rather we should be working to design a language that
/can/ be written by hand but which can equally well be written
by well-designed authoring tools, and which results in documents
that are (a) maximally accessible (surely the first criterion),
and (b) can be efficiently rendered.  I would say a lot more about
the design desiderata, but that would be off-topic for this thread.

Philip Taylor

Received on Thursday, 21 June 2007 16:01:23 UTC