Christian Ottosson <christian.ottosson@kurir.net> wrote: > At least the Unicode line (and paragraph) separators should be > recognized as "white space", I think, shouldn't they? No. "2.3 Common Syntactic Constructs" of XML 1.0 says: S (white space) consists of one or more space (#x20) characters, carriage returns, line feeds, or tabs. cf. http://www.w3.org/TR/REC-xml#sec-common-syn And Production 3 formally defines this as: [3] S ::= (#x20 | #x9 | #xD | #xA)+ cf. http://www.w3.org/TR/REC-xml#NT-S So, neither LINE SEPARATOR (U+2028) nor PARAGRAPH SEPARATOR (U+2029) is white space - those are just treated as character data. That's why the validator correctly reports errors (apart from BOM). Moreover, "Unicode in XML and other Markup Languages" specification explicitly discourages the use of line and paragraph separators (U+2028 .. U+2029) as "not suitable for use with markup". cf. http://www.w3.org/TR/unicode-xml/#Charlist So I'd recommend not to use them even as character data. Regards, -- Masayasu Ishikawa / mimasa@w3.org W3C - World Wide Web ConsortiumReceived on Tuesday, 17 October 2000 12:20:51 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 1 October 2009 14:48:40 GMT