- From: <noah_mendelsohn@us.ibm.com>
- Date: Mon, 30 Mar 2009 17:12:58 -0400
- To: John Kemp <john.kemp@nokia.com>
- Cc: Larry Masinter <masinter@adobe.com>, David Orchard <orchard@pacificspirit.com>, "www-tag@w3.org WG" <www-tag@w3.org>
Getting back to this after a long time: John Kemp wrote: > [Noah Mendelsohn wrote:] > > > * The XML Recommendation did not provide any incremental forward > > compatibility with later versions. If a document is labeled anything > > other than 1.0 (or no label, which defaults to 1.0), then the document > > must be rejected. Labeling a document 1.1 thus provided insurance > > that it > > wouldn't be processable at all by the tens or hundreds of millions of > > already deployed XML processors. > > So was XML 1.0 technically guilty of violating the AWWW best practice?: > > "A specification SHOULD provide mechanisms that allow any party to > create extensions." I think the advice in AWWW is too vague to be useful. Let's say that XML had allowed an extension hook that could be used to create, in later versions, structured attributes. That clearly would have been an extension point, but likely would not have helped with the emerging requirement to support element names with new characters. I think AWWW oversimplifies what is ultimately a complex tradeoff: you can't typically anticipate all possible reasonable requirements. Extension mechanisms tend to have some cost in complexity, or in interoperability. On the other hand, as AWWW signals, it can be tremendously valuable to correctly anticipate the sorts of extensions that may be needed, and to define from the start a model for forwards compatibility. So, rather than saying that XML "violated" the GPN (and I'm not sure you can violate a SHOULD anyway), I think it's more appropriate to say that XML didn't provide extension mechanisms that have (so far) proven sufficient to provide for use of new Unicode characters, while maintaining an ability to interoperate when only the old characters are used. There are many, many other sorts of possible extensions one might contemplate for XML, and it's not clear that any one or two hooks would anticipate even the majority of them. > > Obviously, to answer your question directly, the attribute would have > > caused no trouble if it was merely treated as a comment. Maybe or > > maybe > > not some less draconian compatibility rules could have been applied, > > and > > maybe or maybe not the attribute would have been helpful in > > implementing > > them. That's at best undemonstrated, IMO. > > I think we're now bordering on talking about error-handling, with > respect to the presence of "non-conforming" content. Related to how > the version attribute was used, but not to the presence of the version > attribute in the format specification. I don't think that's a direction I want to go. We should not be using the word "error" if we're going to proceed to act on the data as if it were correct. The term error-handling might apply if we were building a tool that processed as correct only XML 1.0, but that to help with debugging, highlighted the specific elements in a non-XML 1.0 document that in fact used new Unicode characters. That seems like error handling. If what we're doing is to process as correct a document labeled XML 1.1, then documents with such a label become part of the correct input to our application, and thus not an error. The key question, I think, is whether the data you extract from the document is trustworthy: if your input is conforming per the pertinent specifications, then you are not doing error-recovery, and the specifications tell you what can be inferred from the input. If the document does not conform to the specification, then you can't use the specification to tell you how to interpret the document. That's the situation we have when an XML 1.1 document is interpreted per the XML 1.0 Recommendation. Noah -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- John Kemp <john.kemp@nokia.com> 02/19/2009 09:47 AM To: "ext noah_mendelsohn@us.ibm.com" <noah_mendelsohn@us.ibm.com> cc: Larry Masinter <masinter@adobe.com>, David Orchard <orchard@pacificspirit.com>, "www-tag@w3.org WG" <www-tag@w3.org> Subject: Re: Formulate erratum text on versioning for the web architecture document Hello Noah, On Feb 18, 2009, at 10:22 PM, ext noah_mendelsohn@us.ibm.com wrote: > John Kemp wrote: > >> Was it the presence of the 'version' attribute in the specification >> of >> XML, or the fact that it must say '1.1' in the case that an XML 1.1 >> instance was being exchanged? > > First, I think it's worth observing that at best the attribute did not > solve a compatibility problem, since (unless I'm remembering some > detail) > no document that was otherwise legal in 1.0 and 1.1 had differing > interpretations per the two specifications. So, labeling a document > 1.1 > really was a signal that "yes, I know I'm using new characters in the > document below, and I meant it". Yes, agreed. > * The XML Recommendation did not provide any incremental forward > compatibility with later versions. If a document is labeled anything > other than 1.0 (or no label, which defaults to 1.0), then the document > must be rejected. Labeling a document 1.1 thus provided insurance > that it > wouldn't be processable at all by the tens or hundreds of millions of > already deployed XML processors. So was XML 1.0 technically guilty of violating the AWWW best practice?: "A specification SHOULD provide mechanisms that allow any party to create extensions." > > * The XML 1.1 Recommendation did suggest that the 1.1 version marker > be > used only if some 1.0-incompatible content was in the document. Turns > out, that's easy advice to give, but hard to implement. It almost > ensures > that a general purpose application will have to make two passes in > creating a document: one pass to look for new characters, and the > second > to output it. You can have a gigabyte of 1.0 compatible output and > discover that, at the very end, some character you wanted to write is > legal only in 1.1. Well, you better not have written the very first > line > of the file, because that now has to have its version attribute > changed. > In practice, lots of 1.1-capable software (well, I'm not sure there > was > much 1.1-capable software, but a high percentage of what there was...) > unconditionally applied the new label. > > Obviously, to answer your question directly, the attribute would have > caused no trouble if it was merely treated as a comment. Maybe or > maybe > not some less draconian compatibility rules could have been applied, > and > maybe or maybe not the attribute would have been helpful in > implementing > them. That's at best undemonstrated, IMO. I think we're now bordering on talking about error-handling, with respect to the presence of "non-conforming" content. Related to how the version attribute was used, but not to the presence of the version attribute in the format specification. > > >> I read this line as suggesting that a format specification should >> provide a mechanism for instances to indicate a version of the >> specification to which the author of the instance believes the >> instance complies. > > Me too. For reasons such as the ones given above, I'm not convinced > that's in general good advice. If the author of a specification fails to provide an EXPLICIT mechanism for indicating format version in instances of the format then what will happen is that IMPLICIT versioning will occur. For example, an XML element will contain a certain XML attribute in one version of the language, and will not contain that attribute in another version of the language. Of course, that may happen when people author instances anyway, so the intent, I think, of this best practice is to ensure that the authors of a format specification think about version indications (somewhat) separately from the actual changes that occur in different versions of the language. It is right to question the use of version indications given the failure in adoption of so many 1.X versions that did provide explicit methods of versioning. We might ask "what is an explicit versioning mechanism good for?" and attempt to document that. Or provide specific examples (as you have started below) where an explicit version indication is useful. > Furthermore, I think it's only defensible > if one can answer the sorts of questions raised in the recipe > example in > the TAG blog entry. The AWWW suggests that a mechanism should be > provided > in the instance, without pointing out such points of confusion. I'm > not > saying that providing for version information in the instance is > always a > mistake. I do think it only makes sense when: > > * One can answer questions such as: is an author responsible for > naming > any one version with which the document is compatible? The newest? > The > oldest? More than one? > > * The rules for accepting, rejecting and interpreting the content of a > document are shown to be (helpfully) influenced by the presence of the > version information. > > The one case I'm convinced of is the one I mentioned earlier: if you > introduce incompatible changes, such that the same document is legal > in > more than one version, but that it means different things, then > labeling > the instance is essential. For example, if an early version of a data > format referenced arrays with one-based indices, and a later version > changed to zero-based, it would be essential to label the instance > in a > way that would allow the intended interpretation to be discovered. I agree with your points. I would suggest that none of these points invalidate the best practice "that a format specification should provide a mechanism for instances to indicate a version of the specification to which the author of the instance believes the instance complies." However, they do raise interesting issues that, in my opinion, should be documented somewhere (if not already) as they provide an additional level of detail based on actual experience in performing language versioning on the Web. Regards, - johnk
Received on Monday, 30 March 2009 21:12:33 UTC