- From: John Cowan <cowan@mercury.ccil.org>
- Date: Fri, 11 Apr 2014 12:34:07 -0400
- To: Jan.Petvalsky@tieto.com
- Cc: xml-editor@w3.org, tbray@textuality.com, jeanpa@microsoft.com, cmsmcq@w3.org, elm@east.sun.com, cowan@ccil.org
Note: I am speaking for myself only, and not for the XML Core WG, never mind anyone else. Jan.Petvalsky@tieto.com scripsit: > I can see that there are differences in XML versions specification > regarding to character data: > > http://www.w3.org/TR/xml11/#NT-Char > http://www.w3.org/TR/REC-xml/#NT-Char > > This unclear definition make that issue that one XML document could > be valid for one XML processor, but not for others. Rather, there are two different kinds of XML documents, XML 1.0 and XML 1.1. An XML processor may accept XML 1.0 only, or XML 1.1 only, or both. (For that matter, it might accept JSON or any other format as well.) > It should be fixed that at least from specification definition that > any UNICODE character is valid. The character U+0000 was intentionally rejected for XML 1.1 character content. This is unlikely to change in future. > It is possible to rewrite that by “&#” for some processors, but > this not accepted by others. It is acceptable in XML 1.1 documents, but not in XML 1.0 documents. > I hope that you read and not put to bin. I hope that you also mark > that XML version that is obsolete as obsolete. XML 1.0 is not obsolete. XML 1.1 is intended only for specific use cases that XML 1.0 cannot handle. > See also: http://stackoverflow.com/questions/9526951/xml-and-unicode-specifications-whats-a-legal-character There is indeed a problem with Section 2.2, which reads "Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646." Obviously, TAB, CR, and LF are already legal characters. As I just noted on that page, the First Edition of XML (1998) read "the legal graphic characters of Unicode." For whatever reason, the word "graphic" was removed from the Second Edition of 2000, perhaps because it is inaccurate: XML allows many characters that are not graphic characters. A correct, if not necessarily clear, revision would be to add the words "except those in general category Cc". > PS: Frankly speaking I would like to have XML 2.0 that it will be > called short-xml, so pair tag will be possible to write in short form > (e.g. <tag>…</tag> is same as <tag>…</>). This also is unlikely to happen. The failure of XML 1.1 has made us very unwilling to work on any successor format that does not have *major* advantages over XML 1.0. -- John Cowan http://www.ccil.org/~cowan cowan@ccil.org At times of peril or dubitation, Perform swift circular ambulation, With loud and high-pitched ululation.
Received on Friday, 11 April 2014 16:34:40 UTC