- From: <noah_mendelsohn@us.ibm.com>
- Date: Mon, 30 Mar 2009 17:12:58 -0400
- To: John Kemp <john.kemp@nokia.com>
- Cc: Larry Masinter <masinter@adobe.com>, David Orchard <orchard@pacificspirit.com>, "www-tag@w3.org WG" <www-tag@w3.org>
Getting back to this after a long time:
John Kemp wrote:
> [Noah Mendelsohn wrote:]
>
> > * The XML Recommendation did not provide any incremental forward
> > compatibility with later versions. If a document is labeled anything
> > other than 1.0 (or no label, which defaults to 1.0), then the document
> > must be rejected. Labeling a document 1.1 thus provided insurance
> > that it
> > wouldn't be processable at all by the tens or hundreds of millions of
> > already deployed XML processors.
>
> So was XML 1.0 technically guilty of violating the AWWW best practice?:
>
> "A specification SHOULD provide mechanisms that allow any party to
> create extensions."
I think the advice in AWWW is too vague to be useful. Let's say that XML
had allowed an extension hook that could be used to create, in later
versions, structured attributes. That clearly would have been an
extension point, but likely would not have helped with the emerging
requirement to support element names with new characters.
I think AWWW oversimplifies what is ultimately a complex tradeoff: you
can't typically anticipate all possible reasonable requirements. Extension
mechanisms tend to have some cost in complexity, or in interoperability.
On the other hand, as AWWW signals, it can be tremendously valuable to
correctly anticipate the sorts of extensions that may be needed, and to
define from the start a model for forwards compatibility.
So, rather than saying that XML "violated" the GPN (and I'm not sure you
can violate a SHOULD anyway), I think it's more appropriate to say that
XML didn't provide extension mechanisms that have (so far) proven
sufficient to provide for use of new Unicode characters, while maintaining
an ability to interoperate when only the old characters are used. There
are many, many other sorts of possible extensions one might contemplate
for XML, and it's not clear that any one or two hooks would anticipate
even the majority of them.
> > Obviously, to answer your question directly, the attribute would have
> > caused no trouble if it was merely treated as a comment. Maybe or
> > maybe
> > not some less draconian compatibility rules could have been applied,
> > and
> > maybe or maybe not the attribute would have been helpful in
> > implementing
> > them. That's at best undemonstrated, IMO.
>
> I think we're now bordering on talking about error-handling, with
> respect to the presence of "non-conforming" content. Related to how
> the version attribute was used, but not to the presence of the version
> attribute in the format specification.
I don't think that's a direction I want to go. We should not be using the
word "error" if we're going to proceed to act on the data as if it were
correct. The term error-handling might apply if we were building a tool
that processed as correct only XML 1.0, but that to help with debugging,
highlighted the specific elements in a non-XML 1.0 document that in fact
used new Unicode characters. That seems like error handling. If what
we're doing is to process as correct a document labeled XML 1.1, then
documents with such a label become part of the correct input to our
application, and thus not an error.
The key question, I think, is whether the data you extract from the
document is trustworthy: if your input is conforming per the pertinent
specifications, then you are not doing error-recovery, and the
specifications tell you what can be inferred from the input. If the
document does not conform to the specification, then you can't use the
specification to tell you how to interpret the document. That's the
situation we have when an XML 1.1 document is interpreted per the XML 1.0
Recommendation.
Noah
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
John Kemp <john.kemp@nokia.com>
02/19/2009 09:47 AM
To: "ext noah_mendelsohn@us.ibm.com"
<noah_mendelsohn@us.ibm.com>
cc: Larry Masinter <masinter@adobe.com>, David Orchard
<orchard@pacificspirit.com>, "www-tag@w3.org WG" <www-tag@w3.org>
Subject: Re: Formulate erratum text on versioning for the
web architecture document
Hello Noah,
On Feb 18, 2009, at 10:22 PM, ext noah_mendelsohn@us.ibm.com wrote:
> John Kemp wrote:
>
>> Was it the presence of the 'version' attribute in the specification
>> of
>> XML, or the fact that it must say '1.1' in the case that an XML 1.1
>> instance was being exchanged?
>
> First, I think it's worth observing that at best the attribute did not
> solve a compatibility problem, since (unless I'm remembering some
> detail)
> no document that was otherwise legal in 1.0 and 1.1 had differing
> interpretations per the two specifications. So, labeling a document
> 1.1
> really was a signal that "yes, I know I'm using new characters in the
> document below, and I meant it".
Yes, agreed.
> * The XML Recommendation did not provide any incremental forward
> compatibility with later versions. If a document is labeled anything
> other than 1.0 (or no label, which defaults to 1.0), then the document
> must be rejected. Labeling a document 1.1 thus provided insurance
> that it
> wouldn't be processable at all by the tens or hundreds of millions of
> already deployed XML processors.
So was XML 1.0 technically guilty of violating the AWWW best practice?:
"A specification SHOULD provide mechanisms that allow any party to
create extensions."
>
> * The XML 1.1 Recommendation did suggest that the 1.1 version marker
> be
> used only if some 1.0-incompatible content was in the document. Turns
> out, that's easy advice to give, but hard to implement. It almost
> ensures
> that a general purpose application will have to make two passes in
> creating a document: one pass to look for new characters, and the
> second
> to output it. You can have a gigabyte of 1.0 compatible output and
> discover that, at the very end, some character you wanted to write is
> legal only in 1.1. Well, you better not have written the very first
> line
> of the file, because that now has to have its version attribute
> changed.
> In practice, lots of 1.1-capable software (well, I'm not sure there
> was
> much 1.1-capable software, but a high percentage of what there was...)
> unconditionally applied the new label.
>
> Obviously, to answer your question directly, the attribute would have
> caused no trouble if it was merely treated as a comment. Maybe or
> maybe
> not some less draconian compatibility rules could have been applied,
> and
> maybe or maybe not the attribute would have been helpful in
> implementing
> them. That's at best undemonstrated, IMO.
I think we're now bordering on talking about error-handling, with
respect to the presence of "non-conforming" content. Related to how
the version attribute was used, but not to the presence of the version
attribute in the format specification.
>
>
>> I read this line as suggesting that a format specification should
>> provide a mechanism for instances to indicate a version of the
>> specification to which the author of the instance believes the
>> instance complies.
>
> Me too. For reasons such as the ones given above, I'm not convinced
> that's in general good advice.
If the author of a specification fails to provide an EXPLICIT
mechanism for indicating format version in instances of the format
then what will happen is that IMPLICIT versioning will occur. For
example, an XML element will contain a certain XML attribute in one
version of the language, and will not contain that attribute in
another version of the language. Of course, that may happen when
people author instances anyway, so the intent, I think, of this best
practice is to ensure that the authors of a format specification think
about version indications (somewhat) separately from the actual
changes that occur in different versions of the language.
It is right to question the use of version indications given the
failure in adoption of so many 1.X versions that did provide explicit
methods of versioning. We might ask "what is an explicit versioning
mechanism good for?" and attempt to document that. Or provide specific
examples (as you have started below) where an explicit version
indication is useful.
> Furthermore, I think it's only defensible
> if one can answer the sorts of questions raised in the recipe
> example in
> the TAG blog entry. The AWWW suggests that a mechanism should be
> provided
> in the instance, without pointing out such points of confusion. I'm
> not
> saying that providing for version information in the instance is
> always a
> mistake. I do think it only makes sense when:
>
> * One can answer questions such as: is an author responsible for
> naming
> any one version with which the document is compatible? The newest?
> The
> oldest? More than one?
>
> * The rules for accepting, rejecting and interpreting the content of a
> document are shown to be (helpfully) influenced by the presence of the
> version information.
>
> The one case I'm convinced of is the one I mentioned earlier: if you
> introduce incompatible changes, such that the same document is legal
> in
> more than one version, but that it means different things, then
> labeling
> the instance is essential. For example, if an early version of a data
> format referenced arrays with one-based indices, and a later version
> changed to zero-based, it would be essential to label the instance
> in a
> way that would allow the intended interpretation to be discovered.
I agree with your points.
I would suggest that none of these points invalidate the best practice
"that a format specification should provide a mechanism for instances
to indicate a version of the specification to which the author of the
instance believes the instance complies."
However, they do raise interesting issues that, in my opinion, should
be documented somewhere (if not already) as they provide an additional
level of detail based on actual experience in performing language
versioning on the Web.
Regards,
- johnk
Received on Monday, 30 March 2009 21:12:33 UTC