xml11Names-46

/ "Ian B. Jacobs" <ij@w3.org> was heard to say:
| Minutes of the TAG's 7 June 2004 teleconf are available
| as HTML [1] and as text below.
[...]
|           Action NW: Write up the issue for the TAG. If there are no
|           objections to formulation, forward to the XML CG on behalf of
|           TAG.

If we examine the intersection of XML 1.0, XML 1.1, XML Schema 1.0, 
and evolving XSL, XML Query, and XML Protocol specifications, we find
a small but unquestionably thorny issue.

This message attempts to describe that issue, identified by the TAG
as xml11Names-46.

Michael Sperberg-McQueen provided his own summary[1] of the same issue
several days before I had a chance to work on this item. This note was
certainly informed by that analysis and I think Michael for it.

XML 1.1 makes essentially four changes to XML 1.0:

 1. It increases the number of characters that may legally appear in Names.
 2. Adds several new characters that may appear in text if they are
    encoded as numeric character references (C0 controls except NUL).
 3. Removes several characters so that they may not appear in text if
    they are not encoded as numeric character references (C1 controls).
 4. Adds as a line-end character.

Of these, points 3 and 4 have no effect beyond the parser. An XML 1.0
document may contain C1 controls unescaped where they must be escaped
in an XML 1.1 document, in either case the application sees the
Unicode characters. In an XML 1.1 document, NEL in text will be
replaced with a line feed, but that won't generally have any effect to
the application.

Point 2 might be important to an application, but given the enormous
range of Unicode characters that area already valid in text, I expect
it's going to be a rare application that cares, or even notices, that
the parser has slipped a few new characters in there.

Point 1 is the real problem. (And this is a problem that simply could
not be avoided if XML was going to continue to provide
non-discriminatory I18N support. Allowing element and attribute names
to contain characters from new scripts as Unicode evolves was
necessary.)

In a nutshell, the problem is this: XML Schema 1.0 normatively refers
to XML Namespaces 1.0 for the definition of QName and XML Namespaces
1.0 normatively refers to XML 1.0 for the definition of Name and XML
1.0 has fewer Name characters than XML 1.1.

That means that by a strict interpretation of the Recommendations, it
is impossible to write an XML Schema for a document that uses the
"new" Name characters. And by extension, it is impossible for an
XPath expression or a protocol document to use XML 1.1.

This is a problem that must be overcome, and overcome quickly before
new specs are completed, in order to provide any reasonable hope of
providing XML 1.1 support to those communities that are relying on it.

I observe that QNames are now used not only in the XML Activity, but
also in areas beyond XML. Addressing this problem will have benefit
not just for XML, but for all users of QNames.

                                        Be seeing you,
                                          norm

[1] http://lists.w3.org/Archives/Member/w3c-xml-cg/2004Jun/0015.html

-- 
Norman.Walsh@Sun.COM / XML Standards Architect / Sun Microsystems, Inc.
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.

Received on Friday, 11 June 2004 16:44:11 UTC