Re: versioning, robustness principle, doctypes etc from Henri Sivonen on 2009-09-17 (www-tag@w3.org from September 2009)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Thu, 17 Sep 2009 10:18:04 +0300
To: noah_mendelsohn@us.ibm.com
Cc: Larry Masinter <masinter@adobe.com>, "www-tag@w3.org" <www-tag@w3.org>
Message-Id: <53AC7196-84DB-46EE-A25E-503EAF5ABA40@iki.fi>
On Sep 15, 2009, at 04:52, noah_mendelsohn@us.ibm.com wrote:

> Henri Sivonen writes:
>
>> I'm not assuming that. What I assuming is that if the newer language
>> makes old language features non-conforming, the old language features
>> are bad right away.
>
> Why is this true in all interesting cases?

Presumably specs shouldn't make old language features non-conforming  
unless the old features are bad. I can see a point that <font> has  
always been bad but it couldn't be made non-conforming until CSS was  
available to authors. However, I'd argue that when something is made  
non-conforming because it's bad and there's a less bad replacement,  
the worse thing shouldn't be made non-conforming until the replacement  
has been deployed.

> Example:  someone deploys a
> language for which keyword are treated as case-insensitive, so xxx,  
> Xxx,
> xXx, and XXX are the same.  Over time, for whatever reason, it's  
> decided
> that this variability is a bad thing.  So, users are warned, and
> eventually a new version of the language is introduced that  
> mandates, say,
> lowercase only.  Per this new version of the language, all except the
> first of the above identifiers are illegal, or if you prefer,  
> undefined.

(In the case of Web languages, you almost always don't have the  
liberty to update a language in a way that makes a previously defined  
thing be processed as undefined.)

> Let's imagine, though, that in filesystems, databases, and/or web  
> servers
> around the world there live lots of documents written to the old
> specification.  Why is their use of the old feature (in this example  
> the
> upperase or mixed case keywords) "bad right away"?

If variability is bad and non-variability was already allowed under  
the old spec, the old documents could have been authored in a way that  
is now declared the non-bad way. Why wasn't it bad back when the old  
documents were authored to have the variability? Or the other way: If  
variability is bad, why would it be less bad depending on when the  
content was authored?

On the other hand, if that variability isn't bad enough to warrant an  
error, the new spec shouldn't say variability is bad to the point of  
being an error.

There are cases when old practices are very, very mildly bad so that  
it's both wasteful to edit old content to remove the practices and  
it's wasteful to keep using the practice for new documents. For  
example, specifying the type="text/css" attribute on the HTML <style>  
element is this kind of practice--specifying the attribute is just  
waste of typing, network bandwidth, parsing time and maintainer's  
reading attention. I think in cases like this, the waste of fixing old  
but maintained stuff should outweigh fighting the prospective waste  
and the practice shouldn't be proclaimed as an error, although it  
would be appropriate to communicate the prospective wastefulness to  
authors in other ways.

> Whether or not inband
> version identifiers are used to help sort things out, I don't think  
> those
> old documents need immediately be treated as "bad".   Software can  
> still
> process them and indicate:  you seem to have used a feature that was
> discontinued as of version N; I can continue or choke as you prefer.

In practice, Web language consumers have an incentive not to choke if  
they maintain a non-choking code path anyway, so it's useless for  
specs to give them an opportunity to continue or choke depending on  
version identifier. Thus, consuming software using a version  
identifier to sabotage itself is not a useful motivation for having  
version identifiers.

(The incentive is this: UA A implements language as of version N-1 and  
doesn't choke on feature F. UA B implements language versions N-1 and  
N but doesn't choke on feature F if content specifies N-1 but chokes  
when content specifies N. When content specifies N but uses feature F  
nonetheless, UA A seems to work better from the user point of view  
that UA B. Thus, UA B disadvantages itself by implementing conditional  
choking.)

> FWIW, my own thinking on in-band version ids is:  if the same  
> content is
> ever going to mean two incompatible things per different definitions  
> of
> the language, e.g. if version one says {0=false, 1=true} and version  
> two
> says {1=false; 0=true}, then some sort of version indicator in the
> document is useful to disambiguate the intended interpretation.

This situation is contrived and doesn't arise in well-maintained Web  
languages. If a working group tried to update a Web language in such a  
gratuitously incompatible way, the community would flip the bozo bit  
on the spec and the WG and ignore them, so the specified  
incompatibility wouldn't have a bearing on practical matters. Thus,  
gratuitously incompatible changes aren't a useful motivation for  
having version identifiers.

(Besides, we could leave it to the hypothetical reckless future WG to  
define a version identifier if/when they make such an incompatible  
change. After all, consuming software developed now wouldn't be able  
to guess what incompatibility a new version identifier means and  
they'd keep on processing the content anyway due to the self-sabotage  
disincentive outlined above.)

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Thursday, 17 September 2009 07:18:48 UTC