Re: Formulate erratum text on versioning for the web architecture document

Getting back to this after a long time:

John Kemp wrote:

> [Noah Mendelsohn wrote:]
> 
> > * The XML Recommendation did not provide any incremental forward
> > compatibility with later versions.  If a document is labeled anything
> > other than 1.0 (or no label, which defaults to 1.0), then the document
> > must be rejected.  Labeling a document 1.1 thus provided insurance 
> > that it
> > wouldn't be processable at all by the tens or hundreds of millions of
> > already deployed XML processors.
> 
> So was XML 1.0 technically guilty of violating the AWWW best practice?:
> 
>    "A specification SHOULD provide mechanisms that allow any party to 
> create extensions."

I think the advice in AWWW is too vague to be useful.  Let's say that XML 
had allowed an extension hook that could be used to create, in later 
versions, structured attributes.  That clearly would have been an 
extension point, but likely would not have helped with the emerging 
requirement to support element names with new characters.

I think AWWW oversimplifies what is ultimately a complex tradeoff:  you 
can't typically anticipate all possible reasonable requirements. Extension 
mechanisms tend to have some cost in complexity, or in interoperability. 
On the other hand, as AWWW signals, it can be tremendously valuable to 
correctly anticipate the sorts of extensions that may be needed, and to 
define from the start a model for forwards compatibility.

So, rather than saying that XML "violated" the GPN (and I'm not sure you 
can violate a SHOULD anyway), I think it's more appropriate to say that 
XML didn't provide extension mechanisms that have (so far) proven 
sufficient to provide for use of new Unicode characters, while maintaining 
an ability to interoperate when only the old characters are used.  There 
are many, many other sorts of possible extensions one might contemplate 
for XML, and it's not clear that any one or two hooks would anticipate 
even the majority of them.

> > Obviously, to answer your question directly, the attribute would have
> > caused no trouble if it was merely treated as a comment.  Maybe or 
> > maybe
> > not some less draconian compatibility rules could have been applied, 
> > and
> > maybe or maybe not the attribute would have been helpful in 
> > implementing
> > them.  That's at best undemonstrated, IMO.
> 
> I think we're now bordering on talking about error-handling, with 
> respect to the presence of "non-conforming" content. Related to how 
> the version attribute was used, but not to the presence of the version 
> attribute in the format specification.

I don't think that's a direction I want to go.  We should not be using the 
word "error" if we're going to proceed to act on the data as if it were 
correct.  The term error-handling might apply if we were building a tool 
that processed as correct only XML 1.0, but that to help with debugging, 
highlighted the specific elements in a non-XML 1.0 document that in fact 
used new Unicode characters.  That seems like error handling.  If what 
we're doing is to process as correct a document labeled XML 1.1, then 
documents with such a label become part of the correct input to our 
application, and thus not an error. 

The key question, I think, is whether the data you extract from the 
document is trustworthy:  if your input is conforming per the pertinent 
specifications, then you are not doing error-recovery, and the 
specifications tell you what can be inferred from the input.  If the 
document does not conform to the specification, then you can't use the 
specification to tell you how to interpret the document.  That's the 
situation we have when an XML 1.1 document is interpreted per the XML 1.0 
Recommendation.

Noah


--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








John Kemp <john.kemp@nokia.com>
02/19/2009 09:47 AM
 
        To:     "ext noah_mendelsohn@us.ibm.com" 
<noah_mendelsohn@us.ibm.com>
        cc:     Larry Masinter <masinter@adobe.com>, David Orchard 
<orchard@pacificspirit.com>, "www-tag@w3.org WG" <www-tag@w3.org>
        Subject:        Re: Formulate erratum text on versioning for the 
web architecture document


Hello Noah,

On Feb 18, 2009, at 10:22 PM, ext noah_mendelsohn@us.ibm.com wrote:

> John Kemp wrote:
>
>> Was it the presence of the 'version' attribute in the specification 
>> of
>> XML, or the fact that it must say '1.1' in the case that an XML 1.1
>> instance was being exchanged?
>
> First, I think it's worth observing that at best the attribute did not
> solve a compatibility problem, since (unless I'm remembering some 
> detail)
> no document that was otherwise legal in 1.0 and 1.1 had differing
> interpretations per the two specifications.  So, labeling a document 
> 1.1
> really was a signal that "yes, I know I'm using new characters in the
> document below, and I meant it".

Yes, agreed.

> * The XML Recommendation did not provide any incremental forward
> compatibility with later versions.  If a document is labeled anything
> other than 1.0 (or no label, which defaults to 1.0), then the document
> must be rejected.  Labeling a document 1.1 thus provided insurance 
> that it
> wouldn't be processable at all by the tens or hundreds of millions of
> already deployed XML processors.

So was XML 1.0 technically guilty of violating the AWWW best practice?:

   "A specification SHOULD provide mechanisms that allow any party to 
create extensions."

>
> * The XML 1.1 Recommendation did suggest that the 1.1 version marker 
> be
> used only if some 1.0-incompatible content was in the document.  Turns
> out, that's easy advice to give, but hard to implement.  It almost 
> ensures
> that a general purpose application will have to make two passes in
> creating a document:  one pass to look for new characters, and the 
> second
> to output it.  You can have a gigabyte of 1.0 compatible output and
> discover that, at the very end, some character you wanted to write is
> legal only in 1.1.  Well, you better not have written the very first 
> line
> of the file, because that now has to have its version attribute 
> changed.
> In practice, lots of 1.1-capable software (well, I'm not sure there 
> was
> much 1.1-capable software, but a high percentage of what there was...)
> unconditionally applied the new label.
>
> Obviously, to answer your question directly, the attribute would have
> caused no trouble if it was merely treated as a comment.  Maybe or 
> maybe
> not some less draconian compatibility rules could have been applied, 
> and
> maybe or maybe not the attribute would have been helpful in 
> implementing
> them.  That's at best undemonstrated, IMO.

I think we're now bordering on talking about error-handling, with 
respect to the presence of "non-conforming" content. Related to how 
the version attribute was used, but not to the presence of the version 
attribute in the format specification.

>
>
>> I read this line as suggesting that a format specification should
>> provide a mechanism for instances to indicate a version of the
>> specification to which the author of the instance believes the
>> instance complies.
>
> Me too.  For reasons such as the ones given above, I'm not convinced
> that's in general good advice.

If the author of a specification fails to provide an EXPLICIT 
mechanism for indicating format version in instances of the format 
then what will happen is that IMPLICIT versioning will occur. For 
example, an XML element will contain a certain XML attribute in one 
version of the language, and will not contain that attribute in 
another version of the language. Of course, that may happen when 
people author instances anyway, so the intent, I think, of this best 
practice is to ensure that the authors of a format specification think 
about version indications (somewhat) separately from the actual 
changes that occur in different versions of the language.

It is right to question the use of version indications given the 
failure in adoption of so many 1.X versions that did provide explicit 
methods of versioning. We might ask "what is an explicit versioning 
mechanism good for?" and attempt to document that. Or provide specific 
examples (as you have started below) where an explicit version 
indication is useful.

>  Furthermore, I think it's only defensible
> if one can answer the sorts of questions raised in the recipe 
> example in
> the TAG blog entry.  The AWWW suggests that a mechanism should be 
> provided
> in the instance, without pointing out such points of confusion.  I'm 
> not
> saying that providing for version information in the instance is 
> always a
> mistake.  I do think it only makes sense when:
>
> * One can answer questions such as:  is an author responsible for 
> naming
> any one version with which the document is compatible?  The newest? 
> The
> oldest?  More than one?
>
> * The rules for accepting, rejecting and interpreting the content of a
> document are shown to be (helpfully) influenced by the presence of the
> version information.
>
> The one case I'm convinced of is the one I mentioned earlier:  if you
> introduce incompatible changes, such that the same document is legal 
> in
> more than one version, but that it means different things, then 
> labeling
> the instance is essential.  For example, if an early version of a data
> format referenced arrays with one-based indices, and a later version
> changed to zero-based, it would be essential to label the instance 
> in a
> way that would allow the intended interpretation to be discovered.

I agree with your points.

I would suggest that none of these points invalidate the best practice 
"that a format specification should provide a mechanism for instances 
to indicate a version of the specification to which the author of the 
instance believes the instance complies."

However, they do raise interesting issues that, in my opinion, should 
be documented somewhere (if not already) as they provide an additional 
level of detail based on actual experience in performing language 
versioning on the Web.

Regards,

- johnk

Received on Monday, 30 March 2009 21:12:33 UTC