W3C home > Mailing lists > Public > www-xml-schema-comments@w3.org > July to September 2011

[Bug 13712] Line ends within xs:token

From: <bugzilla@jessica.w3.org>
Date: Sun, 21 Aug 2011 21:57:54 +0000
To: www-xml-schema-comments@w3.org
Message-Id: <E1QvG1i-00042C-0Q@jessica.w3.org>

--- Comment #3 from saasha@acc.umu.se 2011-08-21 21:57:52 UTC ---
(In reply to comment #2)


In light of Unicode's recommendations (2011) explaining that:

"U+2029 paragraph separator (PS) and U+2028 line separator (LS). [...] should
be used wherever the desired function is unambiguous."

http://www.unicode.org/versions/Unicode6.0.0/ch05.pdf (page 150)

one may wonder what happens if a system (of any kind, operative system, DBMS,
etc.) using XML begins to apply Unicode's recommendations.

Being aware that

"Conforming implementations of this specification may provide either the
1.1-based datatypes or the 1.0-based datatypes, or both. If both are supported,
the choice of which datatypes to use in a particular assessment episode should
be under user control."


and that according to

http://www.w3.org/TR/xml11/#sec-line-ends and
http://www.w3.org/TR/xml/#sec-line-ends the character U+2029 paragraph
separator (PS) may be present within XML data, including within an xs:token
(both in XML 1.0-based and XML 1.1-based contexts), I will try to formulate
three possibilities for an addition to the specification of XML Schema 1.1
(part 2).

(1) One (in my opinion acceptable) possibility would be to add two new
datatypes (xs:paragraph and xs:line) to XML Schema 1.1 for portability. Keeping
xs:token unchanged would ensure backward compatibility. These two additions
would be:

(1a) The datatype xs:paragraph could be defined as an xs:token containing no
U+2029 (paragraph separator) and (in XML 1.0-based contexts) no U+0085 (NEL)
either - In XML 1.1-based context, no U+0085 (NEL) would be present anyway.

(1b) Within an XML 1.0-based context: The datatype xs:line could be defined as
an xs:paragraph containing no U+2028 (line separator).

(2) One other (in my opinion problematic) possibility would be instead to
redefine xs:token to take into account U+2029 and U+2028. This would compromise
backward compatibility, though.

(3) A short and honest, but in my opinion not really satisfying, possibility
would be to add a note clarifying that: "Neither xs:token nor any other XML
Schema 1.1 datatype support unambiguous use of U+2029 paragraph separator and
U+2028 line separator as recommented by unicode."



Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Sunday, 21 August 2011 21:57:58 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:50:11 UTC