W3C home > Mailing lists > Public > xmlschema-dev@w3.org > September 2003

RE: Platform-independent way of specifying line separator

From: Anli Shundi <ashundi@tibco.com>
Date: Fri, 05 Sep 2003 17:21:06 -0400
To: winkowski@mitre.org, xmlschema-dev@w3.org
Message-id: <JGEJICKDMCIHCMNBOCOAEEFCCBAA.ashundi@tibco.com>

The parser will convert line endings to #xA but you can still specify
the #xD character through character escaping: &#xD; 

So \n would match all 'normal line endings' as normalized by the
parser.  If the data has an #xD it's because the author didn't want it
to be normalized/considered as a line ending ?

Anli Shundi
TIBCO Software Inc.
www.tibco.com

> -----Original Message-----
> From: xmlschema-dev-request@w3.org
> [mailto:xmlschema-dev-request@w3.org]On Behalf Of winkowski@mitre.org
> Sent: Friday, September 05, 2003 4:28 PM
> To: xmlschema-dev@w3.org
> Subject: Re: Platform-indendent way of specifying line separator
> 
> 
> 
> If end-of-line sequences are normlaized to a single newline 
> character (#xA)
> then I am confused by XML schema regular expressions
> http://www.w3.org/TR/xmlschema-2/#regexs. [Section F.1.1 Character Class
> Escapes, includes regular expressions for \n the newline 
> character line-feed
> (#xA) and \r the return character (#xD) as well as the unicode seperator
> Category Escape (Z, Zs, Zl, Zp). However why even have these if the
> end-of-line handling in XML 1.0 or 1.1 normalizes these to 
> line-feed (#xA)?]
> 
> Also in http://www.unicode.org/reports/tr13/tr13-9.html section 4
> Recommendations is states that "the Unicode Standard defines two 
> unambiguous
> separator characters, Paragraph Separator (PS = 202916) and Line Separator
> (LS = 202816). In Unicode text, the PS and LS characters should be used
> wherever the desired function is unambiguous. Otherwise, the following
> specifies how to cope with an NLF [new line function] when converting from
> other character sets to Unicode, when interpreting characters in text, and
> when converting from Unicode to other character sets.... If you 
> do know the
> exact usage of any NLF, then convert it to LS or PS. " 
> 
> So the the The Unicode Newline Guidelines reccomend using line seperator
> (LS, #x2028) but as we have seen XML 1.0 uses the line-feed (#xA). 
> 
> In reviewing XML 1.0, XML Schema, and the Unicode Newline Guidelines
> together there seems to be a mismatch. Can someone could rationalize these
> discrepancies. Does the schema regexpr \n indeed match a end-of-line
> sequence on all platforms?
> 
> - Dan Winkowski
> 
> 
> 
> 
Received on Friday, 5 September 2003 17:58:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 11 January 2011 00:14:39 GMT