W3C home > Mailing lists > Public > xmlschema-dev@w3.org > September 2003

Re: Platform-indendent way of specifying line separator

From: <winkowski@mitre.org>
Date: Fri, 5 Sep 2003 16:28:06 -0400
Message-ID: <408A6ADBB8EBD511B27700508BB0CDE80C0539@lang06.mitre.org>
To: xmlschema-dev@w3.org

If end-of-line sequences are normlaized to a single newline character (#xA)
then I am confused by XML schema regular expressions
http://www.w3.org/TR/xmlschema-2/#regexs. [Section F.1.1 Character Class
Escapes, includes regular expressions for \n the newline character line-feed
(#xA) and \r the return character (#xD) as well as the unicode seperator
Category Escape (Z, Zs, Zl, Zp). However why even have these if the
end-of-line handling in XML 1.0 or 1.1 normalizes these to line-feed (#xA)?]

Also in http://www.unicode.org/reports/tr13/tr13-9.html section 4
Recommendations is states that "the Unicode Standard defines two unambiguous
separator characters, Paragraph Separator (PS = 202916) and Line Separator
(LS = 202816). In Unicode text, the PS and LS characters should be used
wherever the desired function is unambiguous. Otherwise, the following
specifies how to cope with an NLF [new line function] when converting from
other character sets to Unicode, when interpreting characters in text, and
when converting from Unicode to other character sets.... If you do know the
exact usage of any NLF, then convert it to LS or PS. " 

So the the The Unicode Newline Guidelines reccomend using line seperator
(LS, #x2028) but as we have seen XML 1.0 uses the line-feed (#xA). 

In reviewing XML 1.0, XML Schema, and the Unicode Newline Guidelines
together there seems to be a mismatch. Can someone could rationalize these
discrepancies. Does the schema regexpr \n indeed match a end-of-line
sequence on all platforms?

- Dan Winkowski
Received on Friday, 5 September 2003 16:33:41 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 11 January 2011 00:14:39 GMT