Re: Version indicators (was Re: [public-webapps] Comment on Widget URI (7)) from Robin Berjon on 2009-12-22 (www-tag@w3.org from December 2009)

From: Robin Berjon <robin@berjon.com>
Date: Tue, 22 Dec 2009 13:13:06 +0100
To: Larry Masinter <masinter@adobe.com>
Cc: Marcos Caceres <marcosc@opera.com>, Marcin Hanclik <Marcin.Hanclik@access-company.com>, "www-tag@w3.org" <www-tag@w3.org>
Message-Id: <04C95A3C-4DD4-4CAF-A43A-AE13587A55E7@berjon.com>
Hi Larry,

On Dec 22, 2009, at 03:42 , Larry Masinter wrote:
> I'd be interested in your review of:
>      http://larry.masinter.net/tag-versioning.html

The first thing that jumps up at me is:

"Right now, new HTML features seem to be deployed on the web by HTTP servers "sniffing" the User Agent identifier (consumer capability) and using it to determine which edition of a HTML page should be generated for that consumer."

While UA sniffing is done in places (perhaps most systematically in mobile content adaptation), it is not the most common path for the introduction of new features and I don't believe I have seen many advocate it as best practice. New HTML features tend to be deployed using various techniques for (more or less) graceful degradation such as shims or fallback. That is to say: the exact same content is sent from the producer to all consumers, but the consumers exhibit different behaviours (not using the font but still showing the text, losing interactivity or decorations, emulating the intended behaviour but with strongly diminished performance, etc.).

This is important because it alleviates the need for "new or otherwise unrecognized consumers" to indicate that they support a given set of features. Any form of negotiation introduces a new axis of complexity in the production of content. I think that a sound architectural principle is that interoperability should require as little negotiation between agents as possible, ideally none.

Now I will readily admit that the techniques I mention above are also a form of negotiation; shims say "Ooh! I see you don't support this, let me add it for you", fallback says "you don't understand this? Use that". But compared to out-of-band negotiation which applies to the entire document and deals in broad feature-sets such as "versions" or "agents", these negotiation techniques can be characterised as being "in situ" in that they happen in the proximity of what they affect, and "scoped" in that they (typically) apply to a single feature — both of these properties make them far less complex and far more manageable than global negotiation.

Therefore I would further define the architectural principle outlined above: the "little" in "as little negotiation as possible" is most usefully measured in terms of proximity to the effect (i.e. avoid "Action At A Distance") and affected feature surface area.

Of course there is a limit: at some point if you start having to rely on too many small negotiations you are better off with a single larger one.

For our purposes we can posit that all languages will have a language indicator (media type, root element local name, DOCTYPE, namespace, magic number, file extension, sniffing — it doesn't matter). I think that it is a sound principle that a language indicator should apply to all variants of that language which are "mutually intelligible" (in the linguistics sense) but only to those.

If {Unicorn, 1.0} and {Unicorn, 1.1} are mutually intelligible, then negotiation on the delta of their feature sets can (and should) be localised. In this case a version indicator is too broad a negotiation tool to be useful and can be dropped. It is just the Unicorn language.

If {Unicorn, 1.0} and {Unicorn, 2.0} are not mutually intelligible, then they have no business sharing a name. They could be just Unicorn1 and Unicorn2, or Unicorn and ChupaCabra.

That's in essence what I'd vehicle to working groups when it comes to version indicators: either you don't need them, or they're the wrong solution. However do define a strategy that allows for mutual intelligibility across versions (the truly hard part, that the TAG would IMHO be most helpful with if it were to tackle it), and if it comes to incompatibility then do change the language indicator.

Note that the "Unicorn" language indicators I use above are the technical indicators: for marketing purposes companies (and groups) could of course keep calling their products "Unicorn" while the indicator moved from application/cryptid-1 to application/cryptid-2.

> Languages evolve in different ways. For languages which
> satisfy preconditions:
>  
> 1. evolve slowly and carefully, so as to minimize the number of versions
> 2. there is a single specification which is first updated in standards groups, implemented carefully against the specification, with the specification finalized and implementations deployed only after testing against the specification
> 3. where the language has few or no dependencies on other specifications that are also evolving

In all honesty, I am very much unconvinced that we can produce sound architectural advice based on the above preconditions. The problem I see here is that all of these are backwards-looking (i.e. inductive) whereas the primary goal of a versioning architecture is to provide future-proofing.

It may be that a language has evolved slowly and carefully over the past decade, but that all of a sudden an innovation in another part of the stack opens up a whole swathe of previously impossible or impractical yet justified and valuable features. There may have been a single specification for a length of time, but a rift in the industry causes work on different aspects to be split; or groups at the edges become frustrated with the sluggish progress at the core and produce splinter extensions. The language may have dependencies on things that have long seemed stable and reliable, but that are now brutally changing under one's feet (e.g. the introduction of Unicode). Also, group membership changes over time. The group may have started off as a measured consensus of wise people, but as people have moved on and new ones been drawn in it is now comprised mostly of rowdy fourteen-year-olds eager to change the world.

I contend that a solid versioning strategy (or in fact, any architectural advice) has to be resilient in the face of such changes. If version indicators are predicated on groups and specifications being slow and careful, then they shouldn't be used. If anything, we standards-makers should distrust ourselves — that's how strong constitutions are built.

You list some interesting use cases:

> Then single global embedded version indicator can be useful:  
>  
> 1. validators to know how to validate content against the intended version

I think that looking at this through version indicators is doing it wrong: this is a limitation of schema technology that altogether too often fails to take versioning into account as a founding parameter. I would like a schema language which can be decorated with information about the seventeen specification versions of my language and that instead of returning a boolean valid/invalid for a given document would say:

  - this document is feasibly valid for specification versions 1 through 8 through the application of the ignore-unknown rule to features (and be able to list which features would which version);
  - this document is strictly valid from version 9 on.

Or state that it is invalid at any version level.

> 2. in the unfortunate case of incompatibilities between versions, consumers can provide different behavior

What I am missing here is the reason why one would not just change the language indicator in such a case.

> 3. for user agents to warn their users when receiving unknown versions

I think that this is probably best addressed through a must-understand flag. Sometimes I may use a new element and if you ignore it I don't care. Sometimes I may use the same element and provide a fallback. And at yet other times I might be using that element and the document is meaningless without it. I only want the agent to warn the user if it's the latter (e.g. I'm sending a legal document and interpretation of the whole is necessary). Version indicators don't have the granularity to handle this.

> > So I don't think that it's a question of whether implementations drift
> > compared to specifications — even though in practice that's a factor.
>  
> But if implementations are deployed before there are agreed upon
> specifications, there is a serious problem, well beyond of "versioning".
> This is a problem of chaos in the marketplace. Nothing a standards
> group can do in advance can prevent multiple, incompatible implementations
> from being deployed and content providers being eventually forced
> to guess what the implementations are doing and send different content.
> Certainly this isn't a situation to be celebrated.

Indeed it isn't. I completely agree that if there is outright chaos then there is nothing that we can do. However I'm sure we both agree that interoperability is only rarely near perfect, that specifications are never perfectly clear, and therefore that there will be some amount of drift, and therefore a little amount of chaos.

I believe that strategies to handle these (hopefully) small deltas amongst implementations (both in content and in future revisions to the specification that take the drift into account) are the same as those that make for a good versioning strategy (localised negotiation) — and again that version indicators are too crude an implement to address these.

> > but as soon as you start having a versioning strategy you cease
> > needing a version indicator.
>  
> I'm not sure how "having" a versioning "strategy" helps, much less
> if you "start to have" one. I think part of what I was trying to
> distinguish (which I don't see in your notes) is the difference
> between "version of specification" and "version of implementation".
> When you talk about "versioning strategy", I think most of the
> things you talk about apply primarily to "specifications", but that
> in some cases they also apply to "implementations", but if you
> are careful to distinguish between them, a "versioning strategy"
> might be an agreement of specification writers or organizations,
> while what's needed is actually a binding commitment from
> implementers, which is much harder to get.

I do not make the distinction between specification and implementation when discussing versioning strategies because I do believe that such a strategy is so fundamental to the design of a language that it has to be accepted in full by both parties — without that you cannot usefully evolve your language, or at the very least it becomes baroque as happened with quirksmode. This further entails that the adopted strategy should be as simple as possible.

I don't disagree that it's much harder to get — but it is so important that if the specifiers cannot obtain that from implementers then that community is in trouble.

The compact as it were is that:

  - specifiers define a versioning strategy (i.e. an algorithm for the processing of unknown language constructs) that implementers commit to (preferably as early as possible), and which is thoroughly tested in the test suite;
  - specifiers commit to producing new features in such a manner that the versioning strategy guarantees graceful degradation (or to clearly create a new language if they don't).


This discussion reminded me of a specification specimen that I believe is interesting in the context of this specification:

  SVG Tiny 1.2 — Appendix C. Implementation Requirements
      http://www.w3.org/TR/SVGMobile12/implnote.html

If we look at this appendix and compare it to what happens in effect:

  - Section C.2 defines a simple versioning strategy that doesn't necessitate version indicators. This part works, and is implemented.
  - Section C.4 defines a number of behaviours based on version and profile indicators; to the best of my knowledge no one has implemented that, or even cares.

I'm do not necessarily intend to use the fact that one is implemented and not the other as an argument here (though it is a data point) but rather I think that this wild specimen is a good example in that C.4 is not actually useful in the presence of (the much simpler, clearer) C.2.

[Full disclosure: I wrote the original proposal that became C.2, and fought a losing battle against keeping C.4 in after it had proven useless in 1.1; other (former) members of the SVG WG may therefore have different analyses.]

> Versioning in the web is much more complicated, because
> there are many, many different technical specifications that
> are brought together to create a single platform, including
> not just HTML but CSS and JPEG and PNG and HTTP and URI and
> JavaScript and etc and etc. Maybe 100 specifications all together.

Certainly. That's why I think that the more localised the versioning/negotiation toolset the better. Just imagine having a version indicator for the compound language that is the whole web stack!

-- 
Robin Berjon - http://berjon.com/
Received on Tuesday, 22 December 2009 12:13:37 UTC