Re: Versioning and HTML from Jonathan Rees on 2009-04-26 (www-tag@w3.org from April 2009)

From: Jonathan Rees <jar@creativecommons.org>
Date: Sun, 26 Apr 2009 10:38:10 -0400
To: Larry Masinter <masinter@adobe.com>
Cc: "www-tag@w3.org WG" <www-tag@w3.org>
Message-Id: <2AE932AE-9A33-4B4F-A40F-6DCE589808BC@creativecommons.org>
On Apr 18, 2009, at 2:14 PM, Larry Masinter wrote:
> I wrote:
>
> The general idea of 'versioning' is that you include some indicator  
> of version in the current language that will allow current  
> processors to deal appropriately with future languages and recognize  
> that they don't understand or can process appropriately this future  
> content. The main thing is to categorize or predict the kinds of  
> future content that current implementations should avoid or react to  
> in some appropriate way. What are those categories?

I think version indicators should be approached with some amount of  
skepticism.

You are making an assumption. It is empirically true that one can  
version a language without having version indicators. For example,  
Algol 60 and Algol 68 do not have version indicators. You are making a  
design choice, or perhaps attempting to establish 'versioning' as a  
term of art, not articulating a given.

Expanding on what you said (you use "or" very significantly): Version  
indicators can communicate various kinds of information to consumers,  
depending on the design of the versioning regime itself. In particular:

1. To syntactically characterize the text in question.  The same  
information could in principle be determined by scanning the text to  
see whether it syntactically conforms to the corresponding language  
specification; the purpose of the indicator is to make it unnecessary  
for the consumer to do this.

2. To modulate the interpretation of the text in question. That is,  
depending on what the version indicator is, interpreting agents might  
have to interpret the same text in two different ways.

This choice has profound consequences for the design of future  
versions. Suppose that an A (old) text is marked with indicator A. #1  
does not in itself imply that a text generated by an A-interpreter  
will lead to the desired payoff for a B producer, for any text. That  
will only be true if when we designed the versioning regime we made a  
stipulation that all future versions will have this property (new  
producers "must be" happy with what old consumers do with all texts).   
If we stipulate only sense #1, then future version designers do not  
have the freedom to transition a given interpretation of a given text  
from acceptable (in A) to less acceptable (in B) - or vice versa.

Given forwards and backwards compatibility as a language series  
policy, there is no real need for a version indicator, other than as a  
convenience (so that agents who care don't have to scan the document  
to see if it contains constructs it doesn't understand).

So in any discussion, you need to be clear about the sense of the  
version indicator. Sense 1 is economical in that a consumer can always  
just use a B-interpreter to interpret according to language A. There  
is strong incentive for a consumer to assume it even when doing so  
isn't in spec. Sense 2 is harder to implement since the consumer needs  
two interpreters or two interpretation modes, one for A texts and one  
for B texts.

Version indicators can be helpful, but they just push off the problem  
one level - they are really part of the language(s) in question, so  
they have to be evaluated according to exactly the same criteria that  
one would apply to a language series that doesn't have them. Suppose  
you have language versions A and B, and then a "sum" language C = A +  
B whose texts consist of a version indicator followed by a text of  
either A or B. (If A and B both already have version indicators you  
*may* be able to take C texts = A texts union B texts.) You still have  
to agree ahead of time - before language B is invented - on how to  
interpret texts of C - that is, everyone concerned needs a priori  
knowledge of how to parse and understand version indicators, even if  
it's just to say that rejecting unknown versions, or unknown texts, is  
OK. When you design a language series initially, you may set aside a  
place for version indicators, and specify that the indicator  
"sublanguage" is extensible (i.e. new indicators may come along). If  
you get the indicator language wrong in the first place, e.g. if you  
define it to specify sense 1 instead of sense 2 or vice versa, then  
you may find yourself stuck, either underconstraining the series (so  
that old consumers can't consume new content with confidence) or  
overconstraining series (so that new content will be rejected by  
conforming old consumers).

So version indicators only support extensibility (or whatever other  
goal you're after) if the future consequences for both old and new  
consumers are articulated and documented before the whole process gets  
started.

Best
Jonathan

---Footnotes---

1. Saying that C = A + B where B is not yet invented is not an  
nonsensical as it sounds. An extension may be thought of as a secret  
that is somehow known in principle, but not revealed to producers and  
consumers until some future date. I think of versioning and extension  
as being similar to the concept of single assignment or "future" in  
programming languages.

2. For those of you who read my formal stuff, I used a different  
definition of "language" there... I think that language (or language  
version) as class/predicate of interpreters, or equivalently  
requirements/specification/constraints on interpreters, is probably a  
more useful definition that either language as set of strings or  
language as single interpretation function on set of strings.
Received on Sunday, 26 April 2009 14:38:48 UTC