W3C home > Mailing lists > Public > xml-editor@w3.org > July to September 2002

Re: The version number in XML documents

From: Richard Tobin <richard@cogsci.ed.ac.uk>
Date: Wed, 31 Jul 2002 23:21:43 +0100 (BST)
Message-Id: <200207312221.XAA19086@rhymer.cogsci.ed.ac.uk>
To: Dave Peterson <davep@acm.org>, w3c-xml-plenary@w3.org, xml-editor@w3.org

> If I understand this right, a 1.0 parser already has "a way to reject
> XML 1.1 documents up front".  You're only mandating that it *must* use
> this way.  It appears that you're asserting that the intention of this
> proposal is "to give XML 1.0 parsers a way to" do something they already
> can do.  What's the point?

My view - and I only came to this conclusion recently - is that what's
important is not whether XML 1.0 parsers may or must reject 1.1
documents - as you point out, they already may - but whether a
document labelled 1.1 can be a well-formed 1.0 document.

I'd much prefer that it can't be, so that for any document you can
unambiguously say what version it is.

If a document labelled 1.1 can be a well-formed 1.0 document, then it
has an infoset as a 1.0 document, and that infoset may be different
from its infoset as a 1.1 document.  For example, a document with a
NEL in an NMTOKENS attribute will be normalized differently in 1.1.
That a document has two infosets seems very undesirable.  It would be
better to be able to say "this is a 1.1 document; its infoset is as
defined by XML 1.1; it may be accepted by a 1.0 parser but there is
no guarantee that it can accurately produce the true infoset of the
document".

Another undesirable consequence (of allowing 1.1-labelled documents to
be well-formed 1.0) would be that there could be documents labelled
1.1 that were not well-formed 1.1 documents but were well-formed 1.0
documents.  For example, a document with unnormalized unicode.

-- Richard
Received on Wednesday, 31 July 2002 18:21:47 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:59:32 GMT