Re: Formulate erratum text on versioning for the web architecture document

John Kemp wrote:

> Was it the presence of the 'version' attribute in the specification of 
> XML, or the fact that it must say '1.1' in the case that an XML 1.1 
> instance was being exchanged?

First, I think it's worth observing that at best the attribute did not 
solve a compatibility problem, since (unless I'm remembering some detail) 
no document that was otherwise legal in 1.0 and 1.1 had differing 
interpretations per the two specifications.  So, labeling a document 1.1 
really was a signal that "yes, I know I'm using new characters in the 
document below, and I meant it".

So, given that for whatever reason one might still want to consider having 
the attribute, what specifically caused trouble for XML 1.1 adoption?  My 
opinion is that it was a combination of:

* The XML Recommendation did not provide any incremental forward 
compatibility with later versions.  If a document is labeled anything 
other than 1.0 (or no label, which defaults to 1.0), then the document 
must be rejected.  Labeling a document 1.1 thus provided insurance that it 
wouldn't be processable at all by the tens or hundreds of millions of 
already deployed XML processors.

* The XML 1.1 Recommendation did suggest that the 1.1 version marker be 
used only if some 1.0-incompatible content was in the document.  Turns 
out, that's easy advice to give, but hard to implement.  It almost ensures 
that a general purpose application will have to make two passes in 
creating a document:  one pass to look for new characters, and the second 
to output it.  You can have a gigabyte of 1.0 compatible output and 
discover that, at the very end, some character you wanted to write is 
legal only in 1.1.  Well, you better not have written the very first line 
of the file, because that now has to have its version attribute changed. 
In practice, lots of 1.1-capable software (well, I'm not sure there was 
much 1.1-capable software, but a high percentage of what there was...) 
unconditionally applied the new label.

Obviously, to answer your question directly, the attribute would have 
caused no trouble if it was merely treated as a comment.  Maybe or maybe 
not some less draconian compatibility rules could have been applied, and 
maybe or maybe not the attribute would have been helpful in implementing 
them.  That's at best undemonstrated, IMO.

> I read this line as suggesting that a format specification should 
> provide a mechanism for instances to indicate a version of the 
> specification to which the author of the instance believes the 
> instance complies.

Me too.  For reasons such as the ones given above, I'm not convinced 
that's in general good advice.  Furthermore, I think it's only defensible 
if one can answer the sorts of questions raised in the recipe example in 
the TAG blog entry.  The AWWW suggests that a mechanism should be provided 
in the instance, without pointing out such points of confusion.  I'm not 
saying that providing for version information in the instance is always a 
mistake.  I do think it only makes sense when:

* One can answer questions such as:  is an author responsible for naming 
any one version with which the document is compatible?  The newest?  The 
oldest?  More than one? 

* The rules for accepting, rejecting and interpreting the content of a 
document are shown to be (helpfully) influenced by the presence of the 
version information.

The one case I'm convinced of is the one I mentioned earlier:  if you 
introduce incompatible changes, such that the same document is legal in 
more than one version, but that it means different things, then labeling 
the instance is essential.  For example, if an early version of a data 
format referenced arrays with one-based indices, and a later version 
changed to zero-based, it would be essential to label the instance in a 
way that would allow the intended interpretation to be discovered.

Noah

[1] http://www.w3.org/QA/2007/12/version_identifiers_reconsider.html



--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








John Kemp <john.kemp@nokia.com>
02/18/2009 09:58 AM
 
        To:     "ext noah_mendelsohn@us.ibm.com" 
<noah_mendelsohn@us.ibm.com>
        cc:     Larry Masinter <masinter@adobe.com>, David Orchard 
<orchard@pacificspirit.com>, "www-tag@w3.org WG" <www-tag@w3.org>, 
www-tag-request@w3.org
        Subject:        Re: Formulate erratum text on versioning for the 
web architecture document


Hi Noah,

On Feb 17, 2009, at 6:59 PM, ext noah_mendelsohn@us.ibm.com wrote:

> John Kemp writes:
>
>> Section 4.2 on versioning and extensibility thus seems intended to
>> relate specifically to "data format specifications" and to a 
>> specified
>> agreement regarding representation data.
>>
>> As such, I don't feel that this text casts aspersions on languages
>> such as PHP (which as far as I know has no language specification) or
>> Java. I am not sure that they have the same needs with respect to
>> "agreement on the correct interpretation of representation data". I 
>> do
>> find your comments, however, to be instructive. I'm just not sure how
>> to best use them yet in this specific case.
>
> You're right that AWWW applies more obviously to data formats.
>  Used
> programming languages as an example just because they're so widely 
> known.
> I think that my blog posting makes clear that the tradeoffs and 
> concerns
> apply equally to data format specifications.

As yet, I don't have enough knowledge to say either that AWWW data 
formats and general-purpose programming languages have the same 
requirements or not. I do not think that anything I've read so far 
definitively answers that question for me. That being the case, I was 
simply trying to deal with the text that I found in AWWW. Which is not 
to say that there isn't something more widely applicable to be said, 
or that general-purpose programming languages don't supply us with 
something instructive.

>  For example, I believe that
> the version identifier in XML proved more of a hindrance than a help 
> to
> adoption of XML 1.1.

Was it the presence of the 'version' attribute in the specification of 
XML, or the fact that it must say '1.1' in the case that an XML 1.1 
instance was being exchanged?

As a possible counter-example - SOAP messages carrying the v1.2 
namespace version indicator are not widely used (in any interoperable 
sense anyway). But many of the "version-neutral" changes proposed in 
1.2 made their way into SOAP 1.1 messages via compliance with the WS-I 
Basic Profile. Was SOAP 1.2 then a success or a failure, in that many 
(most?) of its useful changes were "back-ported" to 1.1, but messages 
carrying the explicit 1.2 version identifier don't seem to have caught 
on so much?

>
>
>> Are you saying that a new version identifier should not always be
>> minted just because a new version of the language has been? That at
>> least is the intent I had in writing the additional best practice 
>> text.
>>
>> But I would like to separate the idea of creating a mechanism for
>> allowing version indications, from the practice of assigning and
>> using new version indicators.
>
> You seem to be suggesting that specifications should assign 
> identifiers to
> successive variants of a language, regardless of whether those 
> identifiers
> are to appear in instances.  Maybe that's a good thing in general, 
> e.g. so
> people can discuss one version or another, or maybe it's something 
> to do
> on a case by case basis.  Still, the particular AWWW text we're 
> discussing
> says:
>
>        Good practice: Version information
>
>        A data format specification SHOULD
>        provide for version information.
>
> I read that as intending to say that it should provide for version
> information in the instance documents, and that's what I feel is 
> clearly
> not always true.

I read this line as suggesting that a format specification should 
provide a mechanism for instances to indicate a version of the 
specification to which the author of the instance believes the 
instance complies.

>  Accordingly, I would like to see this replaced with
> advice that is more along the lines of what was in my previous 
> email.  I
> believe that including such a revision in the AWWW errata document [1]
> would be a good step.
>
> Perhaps you're reading the above Good Practice Note (GPN) as not 
> talking
> about the instance at all, but just the specification?

That is correct. I don't see this best practice as mandating that an 
instance indicates its version and I don't believe that the practice 
mandates that the author of a data format create a new version 
identifier when they create a new version of the language. And that 
sounds right.

>   In that case, I
> think it should be clarified. I'm fairly sure that the >intention< 
> was to
> recommend version fields in the instances,

The phrase "data format specification" seems quite specific - not 
mentioning the word 'instance' for example.

One could say something like:

"A data format specification should provide a mechanism by which an 
instance conforming to the specification may indicate a version of the 
specification to which the author of the instance believes the 
instance complies", to be absolutely clear, and if that is what was 
originally meant by AWWW.

Regards,

- johnk

> but I believe this part of AWWW
> was "baked" before my time on the TAG.
>
> Thank you.
>
> Noah
>
> --------------------------------------
> Noah Mendelsohn
> IBM Corporation
> One Rogers Street
> Cambridge, MA 02142
> 1-617-693-4036
> --------------------------------------
>
>
>
>
>
>
>
>
> John Kemp <john.kemp@nokia.com>
> Sent by: www-tag-request@w3.org
> 02/17/2009 05:22 PM
>
>        To:     "ext noah_mendelsohn@us.ibm.com"
> <noah_mendelsohn@us.ibm.com>
>        cc:     David Orchard <orchard@pacificspirit.com>, Larry 
> Masinter
> <masinter@adobe.com>, "www-tag@w3.org WG" <www-tag@w3.org>
>        Subject:        Re: Formulate erratum text on versioning for 
> the
> web architecture document
>
>
> Hello Noah,
>
> Thanks very much for the comments. Addressing only the current action
> for now. More on the general issue later:
>
> On Feb 17, 2009, at 4:42 PM, ext noah_mendelsohn@us.ibm.com wrote:
>
>> John proposes:
>>
>>> I believe that the best practice is still correct and important -
>>> data
>>> format specifications should provide a mechanism (where that
>>> mechanism
>>> might simply be "use XML namespaces") allowing instances to indicate
>>> version information. Authors will likely not know whether they will
>>> later have to create a new, incompatible version of a format a
>>> priori,
>>> but should likely assume that they will.
>>
>> Well, I still respectfully disagree.  This suggests that a big
>> subset of
>> the programming languages we use are poorly designed because they
>> don't
>> invite us to say things like:
>>
>>       <?php PHPVersion="4.1"  ...  ?>
>>
>> or to put Java version="2.0"  in our Java source files.
>
> I was addressing specifically the documentation produced in AWWW
> Section 4 [1], which states:
>
>> "A data format specification (for example, for XHTML, RDF/XML, SMIL,
>> XLink, CSS, and PNG) embodies an agreement on the correct
>> interpretation of representation data."
>
> Section 4.2 on versioning and extensibility thus seems intended to
> relate specifically to "data format specifications" and to a specified
> agreement regarding representation data.
>
> As such, I don't feel that this text casts aspersions on languages
> such as PHP (which as far as I know has no language specification) or
> Java. I am not sure that they have the same needs with respect to
> "agreement on the correct interpretation of representation data". I do
> find your comments, however, to be instructive. I'm just not sure how
> to best use them yet in this specific case.
>
>> I gave my reasons
>> in the blog posting, and I won't repeat them here.
>>
>>> I would suggest, however, that perhaps an additional best practice
>>> might be warranted, along the lines of Noah's suggestion in [3]:
>>>
>>> "If a language, or data format, changes in incompatible ways, a new
>>> version identifier should be assigned to the updated data format, 
>>> and
>>> allowed in document instances."
>>
>> Thank you.  I do think that bit is worth saying.  Overall, I might
>> go with
>> something like this:
>>
>> "In cases where the same instance document has incompatible meanings
>> per
>> two or more versions of the language specification, provision MUST
>> be made
>> for indicating the version(s) used to encode each instance.  Use of
>> explicit version identifiers in other languages is optional, and in
>> some
>> cases such explict identifiers can actually inhibit the adoption of
>> new
>> language versions, or can inhibit interoperability between systems
>> implementing differing versions of the language."
>
> Are you saying that a new version identifier should not always be
> minted just because a new version of the language has been? That at
> least is the intent I had in writing the additional best practice 
> text.
>
> But I would like to separate the idea of creating a mechanism for
> allowing version indications, from the practice of assigning and using
> new version indicators.
>
>>
>> ...or words to that effect.
>>
>> As an example of that last admonition, one can argue that XML 1.1
>> might
>> have been deployed much more successfully if no version attribute 
>> were
>> provided in the XML declaration.
>
> The point I have been trying to make is that this issue doesn't seem
> to be about whether a version attribute is _provided_ in the format
> specification; it is about whether a new version identifier is created
> when a new version of a language is created.
>
>> I don't believe it's the case that the
>> same document ever had two different legal meanings in XML 1.0 and 
>> XML
>> 1.1;  it's just that some documents are legal in one version and some
>> legal in the other.  XML 1.0 processors would have rejected content
>> using
>> new XML 1.1 characters just as surely (if not just as early) if no
>> version
>> identifier were provided.  The ID is really just a cross check or
>> early
>> warning in such cases.  The only time it's really crucial is if the
>> same
>> document can mean different things as the specification changes.
>
> I don't think this is a problem caused merely by the existence of a
> 'version' attribute.
>
> Regards,
>
> - johnk
>
> [1] http://www.w3.org/TR/webarch/#formats
>
>
>

Received on Thursday, 19 February 2009 03:23:37 UTC