Re: Platform-indendent way of specifying line separator from winkowski@mitre.org on 2003-09-05 (xmlschema-dev@w3.org from September 2003)

From: <winkowski@mitre.org>
Date: Fri, 5 Sep 2003 16:28:06 -0400
To: xmlschema-dev@w3.org
Message-ID: <408A6ADBB8EBD511B27700508BB0CDE80C0539@lang06.mitre.org>

If end-of-line sequences are normlaized to a single newline character (#xA)
then I am confused by XML schema regular expressions
http://www.w3.org/TR/xmlschema-2/#regexs. [Section F.1.1 Character Class
Escapes, includes regular expressions for \n the newline character line-feed
(#xA) and \r the return character (#xD) as well as the unicode seperator
Category Escape (Z, Zs, Zl, Zp). However why even have these if the
end-of-line handling in XML 1.0 or 1.1 normalizes these to line-feed (#xA)?]

Also in http://www.unicode.org/reports/tr13/tr13-9.html section 4
Recommendations is states that "the Unicode Standard defines two unambiguous
separator characters, Paragraph Separator (PS = 202916) and Line Separator
(LS = 202816). In Unicode text, the PS and LS characters should be used
wherever the desired function is unambiguous. Otherwise, the following
specifies how to cope with an NLF [new line function] when converting from
other character sets to Unicode, when interpreting characters in text, and
when converting from Unicode to other character sets.... If you do know the
exact usage of any NLF, then convert it to LS or PS. " 

So the the The Unicode Newline Guidelines reccomend using line seperator
(LS, #x2028) but as we have seen XML 1.0 uses the line-feed (#xA). 

In reviewing XML 1.0, XML Schema, and the Unicode Newline Guidelines
together there seems to be a mismatch. Can someone could rationalize these
discrepancies. Does the schema regexpr \n indeed match a end-of-line
sequence on all platforms?

- Dan Winkowski

Received on Friday, 5 September 2003 16:33:41 UTC