Request/Question: XML specification - unclear character data definition

Good afternoon. 
 
 
I can see that there are differences in XML versions specification regarding
to character data:

 
 
http://www.w3.org/TR/xml11/#NT-Char(http://www.w3.org/TR/xml11/#NT-Char)

http://www.w3.org/TR/REC-xml/#NT-Char(http://www.w3.org/TR/REC-xml/#NT-Char)

 
 
This unclear definition make that issue that one XML document could be valid
for one XML processor, but not for others.

It should be fixed that at least from specification definition that any 
UNICODE character is valid.

 
 
If you think that this is followed this is not true especially for control 
characters “#x0000 – #x001F” that is handled differently.

It is possible to rewrite that by “&#” for some processors, but this not 
accepted by others. What it is worst I think that this rule is 
applied to CDATA sections. So one processor allow “&#”, but not allow that 
in CDATA section.
if you think that this character is not often used, that may be wrong e.g. 
vertical tab seems to be in use in Microsoft Office.

 
 
I hope that you read and not put to bin. I hope that you also mark that XML 
version that is obsolete as obsolete.

 
 
See also: http://stackoverflow.com/questions/9526951/xml-and-unicode-
specifications-whats-a-legal-character
(http://stackoverflow.com/questions/9526951/xml-and-unicode-specifications-whats-a-legal-character)

 
 
Thank you in advance for correction of specification.   
 
 
 
PS: Frankly speaking I would like to have XML 2.0 that it will be called 
short-xml, so pair tag will be possible to 
write in short form (e.g. <tag>…</tag> is same as <tag>…</>). 

 

Pětvalský Jan 

 

 

 

 
 --=_01de02ff15760a9337fdc23dae7c89-8ea2-5a98-9b63-fb777c82d79a_Content-Type: text/html;
 charset=utf-8
Content-Transfer-Encoding: quoted-printable

<html><body><font size="3"><font color="#000000"><font face="Calibri">Good afternoon.<o:p></o:p></font></font></font><font color="#000000" face="Times New Roman" size="3">

</font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><o:p><font color="#000000" face="Calibri" size="3">&nbsp;</font></o:p></p><font color="#000000" face="Times New Roman" size="3">

</font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><font size="3"><font color="#000000"><font face="Calibri">I can see that there are differences in XML versions
specification regarding to character data:<o:p></o:p></font></font></font></p><font color="#000000" face="Times New Roman" size="3">

</font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><o:p><font color="#000000" face="Calibri" size="3">&nbsp;</font></o:p></p><font color="#000000" face="Times New Roman" size="3">

</font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><a href="http://www.w3.org/TR/xml11/#NT-Char"><font face="Calibri" size="3">http://www.w3.org/TR/xml11/#NT-Char</font></a><o:p></o:p></p><font color="#000000" face="Times New Roman" size="3">

</font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><a href="http://www.w3.org/TR/REC-xml/#NT-Char"><font face="Calibri" size="3">http://www.w3.org/TR/REC-xml/#NT-Char</font></a><o:p></o:p></p><font color="#000000" face="Times New Roman" size="3">

</font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><o:p><font color="#000000" face="Calibri" size="3">&nbsp;</font></o:p></p><font color="#000000" face="Times New Roman" size="3">

</font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><font size="3"><font color="#000000"><font face="Calibri">This unclear definition make that issue that one XML
document could be valid for one XML processor, but not for others.<o:p></o:p></font></font></font></p><font color="#000000" face="Times New Roman" size="3">

</font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><font size="3"><font color="#000000"><font face="Calibri">It should be fixed that at least from specification
definition that any UNICODE character is valid.<o:p></o:p></font></font></font></p><font color="#000000" face="Times New Roman" size="3">

</font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><o:p><font color="#000000" face="Calibri" size="3">&nbsp;</font></o:p></p><font color="#000000" face="Times New Roman" size="3">

</font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><font size="3"><font color="#000000"><font face="Calibri">If you think that this is followed this is not true
especially for control characters “#x0000 – #x001F” that is handled
differently.<o:p></o:p></font></font></font></p><font color="#000000" face="Times New Roman" size="3">

</font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><font size="3"><font color="#000000"><font face="Calibri">It is possible to rewrite that by “&amp;#” for some
processors, but this not accepted by others. What it is worst I think that this
rule is <br>
applied to CDATA sections. So one processor allow “&amp;#”, but not allow that
in CDATA section.<br>
if you think that this character is not often used, that may be wrong e.g.
vertical tab seems to be in use in Microsoft Office.<o:p></o:p></font></font></font></p><font color="#000000" face="Times New Roman" size="3">

</font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><o:p><font color="#000000" face="Calibri" size="3">&nbsp;</font></o:p></p><font color="#000000" face="Times New Roman" size="3">

</font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><font size="3"><font color="#000000"><font face="Calibri">I hope that you read and not put to bin. I hope that you
also mark that XML version that is obsolete as obsolete.<o:p></o:p></font></font></font></p><font color="#000000" face="Times New Roman" size="3">

</font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><o:p><font color="#000000" face="Calibri" size="3">&nbsp;</font></o:p></p><font color="#000000" face="Times New Roman" size="3">

</font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><font color="#000000" face="Calibri" size="3">See also: </font><a href="http://stackoverflow.com/questions/9526951/xml-and-unicode-specifications-whats-a-legal-character"><font face="Calibri" size="3">http://stackoverflow..com/questions/9526951/xml-and-unicode-specifications-whats-a-legal-character</font></a><o:p></o:p></p><font color="#000000" face="Times New Roman" size="3">

</font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><o:p><font color="#000000" face="Calibri" size="3">&nbsp;</font></o:p></p><font color="#000000" face="Times New Roman" size="3">

</font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><font size="3"><font color="#000000"><font face="Calibri">Thank you in advance for correction of specification.
&nbsp;&nbsp;<o:p></o:p></font></font></font></p><font color="#000000" face="Times New Roman" size="3">

</font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><o:p><font color="#000000" face="Calibri" size="3">&nbsp;</font></o:p></p><font color="#000000" face="Times New Roman" size="3">

</font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><font size="3"><font color="#000000"><font face="Calibri">PS: Frankly speaking I would like to have XML 2.0 that it
will be called short-xml, so pair tag will be possible to <br>
write in short form (e.g. &lt;tag&gt;…&lt;/tag&gt; is same as
&lt;tag&gt;…&lt;/&gt;).&nbsp;</font></font></font></p><font size="3"><p class="MsoNormal" style="margin: 0cm 0cm 0pt;">&nbsp;</p><font color="#000000"><font face="Calibri"><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><b><span style='color: black; font-family: "Arial CE","sans-serif"; font-size: 10pt; mso-fareast-language: EN-GB;'>Pětvalský Jan</span></b><span style='font-family: "Times New Roman","serif"; font-size: 12pt; mso-fareast-language: EN-GB;'> <o:p></o:p></span></p><p class="MsoNormal" style="margin: 0cm 0cm 0pt;">&nbsp;</p></font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;"><font face="Times New Roman">

<o:p></o:p></font></p></font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;">&nbsp;</p></font><p class="MsoNormal" style="margin: 0cm 0cm 0pt;">&nbsp;</p><font color="#000000" face="Times New Roman" size="3">

</font></body></html>--=_01de02ff15760a9337fdc23dae7c89-8ea2-5a98-9b63-fb777c82d79a_=--

Received on Friday, 11 April 2014 19:42:05 UTC