Some thoughts about the compatibility between XML 1.0 and XML 1.1

(copy of a post on xml-dev)

I don't feel like entering into the arena of discussing the need for the
modification proposed by the first XML 1.1 WG since I don't feel
qualified to speak about a problem which I have never personaly felt.

I would rather note that it can be an opportunity to test the versioning
of XML on a limited change and that there is probably lots of things to
learn from this first version change.

Let's first list all the impacts on applications using XML:

a) Some documents which are well formed per XML 1.0 may not be well
formed per XML 1.1 (as far as I can tell):

per http://www.w3.org/TR/2001/WD-xml11-20011213/#sec2.13 :

"2.13 W3C Normalization Checking [NEW]

XML processors must/should/may check whether their input documents are
in W3C normalized form, as defined by [Charmod]. XML processors must not
transform the input to be in normalized form. It is a fatal
error/error/not an error for the document not to be in normalized form."

and http://www.w3.org/TR/charmod/#sec-TextNormalization gives an example
of non normalized yet XML 1.0 valid snippet ()

b) Some documents which are not well formed per XML 1.1 may not be well
formed per XML 1.1 (this is the already well discussed consequence of
allowing more characters in names).

c) The "same" text within an element of a XML file may be different if
the file is a XML 1.0 or XML 1.1 document (since the EOL handling has
been changed).

d) The "same" attribute value in a XML file may be different if the file
is a XML 1.0 or XML 1.1 document (since the attribute value
normalization has been changed).

Note: the WG also says that "each entity, including the document entity,
can be separately declared as XML 1.0 or XML 1.1." which *seems* to
allow, in a same document, to mix elements and attributes with both the
new and the old EOL and attribute value handling and this seems like a
weird thing to do.

Having listed these 4 differences, I'd like to assert that most of the
XML 1.0 well formed documents are XML 1.1 well formed and that I would
expect that for a while, most of the XML 1.1 well formed documents will
also be XML 1.0 well formed (people will probably use the new version
number for their new documents even if they don't use extended names).

Let's now have a look at what the WD says about the versioning:

http://www.w3.org/TR/2001/WD-xml11-20011213/#sec2.8

"2.8 Prolog and Document Type Declaration

Change "1.0" everywhere to "1.1"

Add the following paragraph:

XML 1.1 processors should accept XML 1.0 documents as well. If a
document is well-formed or valid XML 1.0, it may be made well-formed or
valid XML 1.1 respectively simply by changing the version number."

Per (a), I *think* that the above statement is not true.

I also think that it's far from being sufficient and that using a
version 1.1 is serving two different purposes:

e) declare that the names may contain a bunch of new characters (note
that it's a "may", not a "must").

f) specify that the parser must use the new EOL and attribute value
handling methods.

For these two purposes, it seems to me that it would be very useful to
let applications overide the version definition found in an instance
document (exactly like it is useful to be able to overide a schema
location) and let them say: process this 1.1 document as 1.0 (report
errors if it's not 1.0 well formed and use the "old" handling), or:
process this 1.0 document as 1.1 (report errors if it's not 1.1 well
formed and use the "new" handling).

I have then looked at a random set of specifications which could be
affected by the change.

John being an editor of both XML 1.1 and the XML infoset, no surprise
with the infoset which should not be impacted.

C14N contains a list of whitespaces "whitespace characters #x9, #xA, and
#xD" which would need to be updated.

XSLT contains also a list of whitespaces but XSLT would be more affected
than that: the version can be specified in its XML output method, and
the transformations to apply when the version of the XSLT stylesheet
and|or a source document is different from the version of the output
document: what should a XSLT processor do when an element or attribute
name which 1.1 well formed but not XML 1.0 well formed is inserted in
the output tree serialized by a XML 1.0 method? or when a text which is
XML 1.0 well formed but not XML 1.1 well formed is inserted in an output
tree processed by a XML 1.1 output method.

The XPath specification has been wise enough to reference the whitespace
definition of XML rather than redefining it. However, if a XPath
processor wanted to give a different result for the normalize-space()
function depending on the version of XML which is used, it would surely
be a problem for it since reporting the XML version to applications is a
feature which is being introduced in DOM Level 3 and is missing from
both SAX 2.0 and DOM Level2 (how can the XPath processor guess the
version of the document, then?).

Finally, W3C XML Schema is also defining the list of whitespaces. Beyond
the editorial change, changing the list would have strange effects
similar to those mentioned for XSLT.

The effect of a number of facets would be modified (facets working on
XML 1.O documents may not work on XML 1.1 documents and vice versa,
enumerations could be affected, the length of strings would give
different results, ...). The effect of derivations by lists would also
be affected (a list element in a XML 1.0 document might for instance be
considered as two different lists elements in XML 1.1).

Even the content models would be affected (content models considered as
complex by XML 1.1 would be considered as mixed by XML 1.0).

I am sure that the lists (of affected specifications and of effects on
the ones I have mentioned) are much longer, but I have thought that it
might be usefull to give this partial list to illustrate what I meant!

Hope this helps.

Eric
-- 
Rendez-vous a Paris pour les Electronic Business Days 2002.
                                    http://www.edifrance.org/ebd/index.htm
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
http://xsltunit.org      http://4xt.org           http://examplotron.org
------------------------------------------------------------------------

Received on Saturday, 15 December 2001 12:03:27 UTC