- From: <winkowski@mitre.org>
- Date: Fri, 5 Sep 2003 16:28:06 -0400
- To: xmlschema-dev@w3.org
If end-of-line sequences are normlaized to a single newline character (#xA) then I am confused by XML schema regular expressions http://www.w3.org/TR/xmlschema-2/#regexs. [Section F.1.1 Character Class Escapes, includes regular expressions for \n the newline character line-feed (#xA) and \r the return character (#xD) as well as the unicode seperator Category Escape (Z, Zs, Zl, Zp). However why even have these if the end-of-line handling in XML 1.0 or 1.1 normalizes these to line-feed (#xA)?] Also in http://www.unicode.org/reports/tr13/tr13-9.html section 4 Recommendations is states that "the Unicode Standard defines two unambiguous separator characters, Paragraph Separator (PS = 202916) and Line Separator (LS = 202816). In Unicode text, the PS and LS characters should be used wherever the desired function is unambiguous. Otherwise, the following specifies how to cope with an NLF [new line function] when converting from other character sets to Unicode, when interpreting characters in text, and when converting from Unicode to other character sets.... If you do know the exact usage of any NLF, then convert it to LS or PS. " So the the The Unicode Newline Guidelines reccomend using line seperator (LS, #x2028) but as we have seen XML 1.0 uses the line-feed (#xA). In reviewing XML 1.0, XML Schema, and the Unicode Newline Guidelines together there seems to be a mismatch. Can someone could rationalize these discrepancies. Does the schema regexpr \n indeed match a end-of-line sequence on all platforms? - Dan Winkowski
Received on Friday, 5 September 2003 16:33:41 UTC