Re: Formulate erratum text on versioning for the web architecture document from David Orchard on 2009-02-18 (www-tag@w3.org from February 2009)

From: David Orchard <orchard@pacificspirit.com>
Date: Wed, 18 Feb 2009 09:19:38 -0800
To: John Kemp <john.kemp@nokia.com>
Cc: "ext noah_mendelsohn@us.ibm.com" <noah_mendelsohn@us.ibm.com>, Larry Masinter <masinter@adobe.com>, "www-tag@w3.org WG" <www-tag@w3.org>, www-tag-request@w3.org
Message-ID: <2d509b1b0902180919n513bbf7dkaf15302d687b6838@mail.gmail.com>
John,

The unfinished versioning finding, at
http://www.w3.org/2001/tag/doc/versioning-compatibility-strategies, goes
through a few reasons why XML 1.1 adoption was difficult.  I'll quote some
sections:

"The decision as to which type of error handling is to be used is
application dependent. XML 1.0 is an example of where an error results in
failure. HTML and CSS specify that most errors are ignored silently
resulting in the equivalent of compatible behaviour."

"The other design of putting version identifiers in the text is very common.
It is may be beneficial and necessary for a receiver to determine the which
version(s) of a language specification can be used to correctly interpret a
given text and know this early in the text. This is often done by providing
an identifier of the version early in a text, such as a version number or
some other structure such as an XML Namespace."

"A good example of an incompatible change that used the traditional minor
version identifier change is XML 1.1. XML 1.0 processors cannot process all
XML 1.1 documents because XML 1.1 extended XML 1.0 where XML 1.0 does not
allow such extension."

"Perhaps the opposite example of extensibility is XML 1.0. As XML is a
meta-language, it can be used to create an almost infinite variety of
languages made up of elements and attributes. But XML allows for very little
extensibility in the XML language. It does not allow for new characters in
element or attribute names, new punctuation such as different quote
characters. This made it very difficult to move to XML 1.1 and adding new
characters to element names for internationalization and other purposes.

The first rule introduced in this Finding relating to extensibility is:

Good Practice: To facilitate independent evolution of producers and
consumers, languages in distributed systems should be
Extensible<http://www.w3.org/2001/tag/doc/versioning#dt-extensible>
."

"On the other hand, XML 1.0 did not specify what an XML 1.0 processor should
do when an XML document identified as XML 1.1 or XML 2.0 was encountered. As
such, many XML 1.0 processors faulted when encountering XML 1.1 documents,
whether those documents contained XML 1.1 content or not. Note that XML 1.1
specified that documents should only be specified as XML 1.1 documents if
they had XML 1.1 content. Perhaps if XML 1.0 had specified that any document
marked XML 1.X should be processable as an XML 1.0 document, in conjunction
with other forwards-compatible versioning techniques, then the migration to
XML 1.1 would have been easier."

"Good Practice: When producers use the lowest possible version identifiers,
languagues should only change the version identifer for incompatible
changes.

This design has the benefit that the consumer does not have to understand
unknown identifiers, but the cost is that the producer must be able to
generate a potentially large number of different version identifiers based
upon the features used and their optionality.
XML 1.1 used this design in an attempt at forwards compatibility when it
specified that XML documents that did not use any XML 1.1 features must be
identified as XML 1.0 documents. "

Cheers,
Dave
On Wed, Feb 18, 2009 at 6:58 AM, John Kemp <john.kemp@nokia.com> wrote:

> Hi Noah,
>
> On Feb 17, 2009, at 6:59 PM, ext noah_mendelsohn@us.ibm.com wrote:
>
>  John Kemp writes:
>>
>>  Section 4.2 on versioning and extensibility thus seems intended to
>>> relate specifically to "data format specifications" and to a specified
>>> agreement regarding representation data.
>>>
>>> As such, I don't feel that this text casts aspersions on languages
>>> such as PHP (which as far as I know has no language specification) or
>>> Java. I am not sure that they have the same needs with respect to
>>> "agreement on the correct interpretation of representation data". I do
>>> find your comments, however, to be instructive. I'm just not sure how
>>> to best use them yet in this specific case.
>>>
>>
>> You're right that AWWW applies more obviously to data formats.
>>  Used
>> programming languages as an example just because they're so widely known.
>> I think that my blog posting makes clear that the tradeoffs and concerns
>> apply equally to data format specifications.
>>
>
> As yet, I don't have enough knowledge to say either that AWWW data formats
> and general-purpose programming languages have the same requirements or not.
> I do not think that anything I've read so far definitively answers that
> question for me. That being the case, I was simply trying to deal with the
> text that I found in AWWW. Which is not to say that there isn't something
> more widely applicable to be said, or that general-purpose programming
> languages don't supply us with something instructive.
>
>   For example, I believe that
>> the version identifier in XML proved more of a hindrance than a help to
>> adoption of XML 1.1.
>>
>
> Was it the presence of the 'version' attribute in the specification of XML,
> or the fact that it must say '1.1' in the case that an XML 1.1 instance was
> being exchanged?
>
> As a possible counter-example - SOAP messages carrying the v1.2 namespace
> version indicator are not widely used (in any interoperable sense anyway).
> But many of the "version-neutral" changes proposed in 1.2 made their way
> into SOAP 1.1 messages via compliance with the WS-I Basic Profile. Was SOAP
> 1.2 then a success or a failure, in that many (most?) of its useful changes
> were "back-ported" to 1.1, but messages carrying the explicit 1.2 version
> identifier don't seem to have caught on so much?
>
>
>>
>>  Are you saying that a new version identifier should not always be
>>> minted just because a new version of the language has been? That at
>>> least is the intent I had in writing the additional best practice text.
>>>
>>> But I would like to separate the idea of creating a mechanism for
>>> allowing version indications, from the practice of assigning and
>>> using new version indicators.
>>>
>>
>> You seem to be suggesting that specifications should assign identifiers to
>> successive variants of a language, regardless of whether those identifiers
>> are to appear in instances.  Maybe that's a good thing in general, e.g. so
>> people can discuss one version or another, or maybe it's something to do
>> on a case by case basis.  Still, the particular AWWW text we're discussing
>> says:
>>
>>       Good practice: Version information
>>
>>       A data format specification SHOULD
>>       provide for version information.
>>
>> I read that as intending to say that it should provide for version
>> information in the instance documents, and that's what I feel is clearly
>> not always true.
>>
>
> I read this line as suggesting that a format specification should provide a
> mechanism for instances to indicate a version of the specification to which
> the author of the instance believes the instance complies.
>
>   Accordingly, I would like to see this replaced with
>> advice that is more along the lines of what was in my previous email.  I
>> believe that including such a revision in the AWWW errata document [1]
>> would be a good step.
>>
>> Perhaps you're reading the above Good Practice Note (GPN) as not talking
>> about the instance at all, but just the specification?
>>
>
> That is correct. I don't see this best practice as mandating that an
> instance indicates its version and I don't believe that the practice
> mandates that the author of a data format create a new version identifier
> when they create a new version of the language. And that sounds right.
>
>   In that case, I
>> think it should be clarified. I'm fairly sure that the >intention< was to
>> recommend version fields in the instances,
>>
>
> The phrase "data format specification" seems quite specific - not
> mentioning the word 'instance' for example.
>
> One could say something like:
>
> "A data format specification should provide a mechanism by which an
> instance conforming to the specification may indicate a version of the
> specification to which the author of the instance believes the instance
> complies", to be absolutely clear, and if that is what was originally meant
> by AWWW.
>
> Regards,
>
> - johnk
>
>
>  but I believe this part of AWWW
>> was "baked" before my time on the TAG.
>>
>> Thank you.
>>
>> Noah
>>
>> --------------------------------------
>> Noah Mendelsohn
>> IBM Corporation
>> One Rogers Street
>> Cambridge, MA 02142
>> 1-617-693-4036
>> --------------------------------------
>>
>>
>>
>>
>>
>>
>>
>>
>> John Kemp <john.kemp@nokia.com>
>> Sent by: www-tag-request@w3.org
>> 02/17/2009 05:22 PM
>>
>>       To:     "ext noah_mendelsohn@us.ibm.com"
>> <noah_mendelsohn@us.ibm.com>
>>       cc:     David Orchard <orchard@pacificspirit.com>, Larry Masinter
>> <masinter@adobe.com>, "www-tag@w3.org WG" <www-tag@w3.org>
>>       Subject:        Re: Formulate erratum text on versioning for the
>> web architecture document
>>
>>
>> Hello Noah,
>>
>> Thanks very much for the comments. Addressing only the current action
>> for now. More on the general issue later:
>>
>> On Feb 17, 2009, at 4:42 PM, ext noah_mendelsohn@us.ibm.com wrote:
>>
>>  John proposes:
>>>
>>>  I believe that the best practice is still correct and important -
>>>> data
>>>> format specifications should provide a mechanism (where that
>>>> mechanism
>>>> might simply be "use XML namespaces") allowing instances to indicate
>>>> version information. Authors will likely not know whether they will
>>>> later have to create a new, incompatible version of a format a
>>>> priori,
>>>> but should likely assume that they will.
>>>>
>>>
>>> Well, I still respectfully disagree.  This suggests that a big
>>> subset of
>>> the programming languages we use are poorly designed because they
>>> don't
>>> invite us to say things like:
>>>
>>>      <?php PHPVersion="4.1"  ...  ?>
>>>
>>> or to put Java version="2.0"  in our Java source files.
>>>
>>
>> I was addressing specifically the documentation produced in AWWW
>> Section 4 [1], which states:
>>
>>  "A data format specification (for example, for XHTML, RDF/XML, SMIL,
>>> XLink, CSS, and PNG) embodies an agreement on the correct
>>> interpretation of representation data."
>>>
>>
>> Section 4.2 on versioning and extensibility thus seems intended to
>> relate specifically to "data format specifications" and to a specified
>> agreement regarding representation data.
>>
>> As such, I don't feel that this text casts aspersions on languages
>> such as PHP (which as far as I know has no language specification) or
>> Java. I am not sure that they have the same needs with respect to
>> "agreement on the correct interpretation of representation data". I do
>> find your comments, however, to be instructive. I'm just not sure how
>> to best use them yet in this specific case.
>>
>>  I gave my reasons
>>> in the blog posting, and I won't repeat them here.
>>>
>>>  I would suggest, however, that perhaps an additional best practice
>>>> might be warranted, along the lines of Noah's suggestion in [3]:
>>>>
>>>> "If a language, or data format, changes in incompatible ways, a new
>>>> version identifier should be assigned to the updated data format, and
>>>> allowed in document instances."
>>>>
>>>
>>> Thank you.  I do think that bit is worth saying.  Overall, I might
>>> go with
>>> something like this:
>>>
>>> "In cases where the same instance document has incompatible meanings
>>> per
>>> two or more versions of the language specification, provision MUST
>>> be made
>>> for indicating the version(s) used to encode each instance.  Use of
>>> explicit version identifiers in other languages is optional, and in
>>> some
>>> cases such explict identifiers can actually inhibit the adoption of
>>> new
>>> language versions, or can inhibit interoperability between systems
>>> implementing differing versions of the language."
>>>
>>
>> Are you saying that a new version identifier should not always be
>> minted just because a new version of the language has been? That at
>> least is the intent I had in writing the additional best practice text.
>>
>> But I would like to separate the idea of creating a mechanism for
>> allowing version indications, from the practice of assigning and using
>> new version indicators.
>>
>>
>>> ...or words to that effect.
>>>
>>> As an example of that last admonition, one can argue that XML 1.1
>>> might
>>> have been deployed much more successfully if no version attribute were
>>> provided in the XML declaration.
>>>
>>
>> The point I have been trying to make is that this issue doesn't seem
>> to be about whether a version attribute is _provided_ in the format
>> specification; it is about whether a new version identifier is created
>> when a new version of a language is created.
>>
>>  I don't believe it's the case that the
>>> same document ever had two different legal meanings in XML 1.0 and XML
>>> 1.1;  it's just that some documents are legal in one version and some
>>> legal in the other.  XML 1.0 processors would have rejected content
>>> using
>>> new XML 1.1 characters just as surely (if not just as early) if no
>>> version
>>> identifier were provided.  The ID is really just a cross check or
>>> early
>>> warning in such cases.  The only time it's really crucial is if the
>>> same
>>> document can mean different things as the specification changes.
>>>
>>
>> I don't think this is a problem caused merely by the existence of a
>> 'version' attribute.
>>
>> Regards,
>>
>> - johnk
>>
>> [1] http://www.w3.org/TR/webarch/#formats
>>
>>
>>
>>
>
Received on Wednesday, 18 February 2009 17:20:19 UTC