XML 1.0 and XML 1.1


Please feel free to forward this email to your WGs.  (Public W3C
mailing lists are fine--this is not intended to be member-only.)



Since XML 1.1 became a W3C Recommendation in August 2006,
there has been a substantial uptake of it as a peer of XML 1.0
in new and ongoing W3C work.  This is appropriate, as XML 1.1
was explicitly not designed to replace XML 1.0, but to supplement
it for the benefit of various groups against which XML 1.0 had
unjustly, but unintentionally, discriminated.

However, there are very few XML 1.1 documents in the wild.
The XML Core WG believes this to be the result of a vicious cycle,
in which widely distributed XML parsers do not support 1.1 because
the parser authors believe that few document authors will use it.
This becomes a self-fulfilling prophecy, as those who would
benefit from XML 1.1 are rightfully concerned that documents
written in it will not be widely acceptable.

After considering various other ideas, the XML Core WG wants
to suggest the possibility of changing XML 1.0 to relax the
restrictions on element and attribute names thereby providing
in XML 1.0 the major end user benefit currently achievable
only by using XML 1.1.

To quote the XML 1.1 Recommendation:

  The W3C's XML 1.0 Recommendation was first issued in 1998,
  and despite the issuance of many errata culminating in a
  Third Edition of 2004, has remained (by intention) unchanged
  with respect to what is well-formed XML and what is not.
  This stability has been extremely useful for interoperability.
  However, the Unicode Standard on which XML 1.0 relies for
  character specifications has not remained static, evolving from
  version 2.0 to version 4.0 and beyond.	Characters not present
  in Unicode 2.0 may already be used in XML 1.0 character data.
  However, they are not allowed in XML names such as element type
  names, attribute names, enumerated attribute values, processing
  instruction targets, and so on.  In addition, some characters
  that should have been permitted in XML names were not, due to
  oversights and inconsistencies in Unicode 2.0.

  The overall philosophy of names has changed since XML 1.0.
  Whereas XML 1.0 provided a rigid definition of names, wherein
  everything that was not permitted was forbidden, XML 1.1 names are
  designed so that everything that is not forbidden (for a specific
  reason) is permitted.  Since Unicode will continue to grow past
  version 4.0, further changes to XML can be avoided by allowing
  almost any character, including those not yet assigned, in names.

Since then, Unicode has expanded further to reach 5.0, and it is
nowhere near complete with respect to the world's minority languages
and writing systems.  If XML 1.0 relaxed the restrictions on element
and attribute names, those who preferred to retain the Appendix B
constraints in their documents would be free to do so, but those
who wish to use element and attribute names in languages normally
written in any of the Ethiopic, Cherokee, Canadian Syllabics, Khmer,
Mongolian, Yi, Philippine, New Tai Lue, Buginese, Syloti Nagri,
N'Ko, and Tifinagh scripts will be able to do so, as will users
of minority languages whose scripts appeared in Unicode 2.0 but
were lacking essential letters for writing those languages.

Of course, older parsers will still reject such documents, but
there will be no need for a strict XML 1.0/1.1 dichotomy.  The
XML Core WG has heard evidence tending to indicate that implementing
such a relaxation would be technically straightforward in essentially
all XML parsers:  it is a matter of replacing a rather large
"permitted" table with a much smaller "forbidden" table.

The XML Core WG assumes that if such an erratum were to be passed
into XML 1.0, the XML 1.1 Recommendation would eventually be deprecated
by the W3C.

Comments on all aspects of this possibility are earnestly solicited;
please send them to www-xml-blueberry-comments@w3.org (publicly

Paul Grosso
for the XML Core WG

Received on Monday, 29 October 2007 16:06:20 UTC