- From: Henry S. Thompson <ht@cogsci.ed.ac.uk>
- Date: Tue, 01 Jul 2003 17:38:08 +0100
- To: Robin Berjon <robin.berjon@expway.fr>
- Cc: Jeni Tennison <jeni@jenitennison.com>, Colin Mackenzie <colin@elecmc.com>, xmlschema-dev@w3.org
Robin Berjon <robin.berjon@expway.fr> writes: <snip/> > Similarly, I couldn't find anything in the spec to control > case-sensitivity. Did I miss it or has it been overlooked? Without it > it is a true pain matching case-insensitive values (barbaz becoming > [bB][aA][rR][bB][aA][rR]). Case insensitivity is somewhere between very difficult and incoherent for Unicode, as I understand it. Different languages have different opinions about what the uppercase/lowercase correspondences are, e.g. (again, allegedly -- I'm not a writing system expert) the upper-case of Montréal, Canada is MONTREAL, but the upper case of Montréal, France is MONTRÉAL. > While on this topic, I'd like to point out that a lot of literature > out there states that XML Schema borrowed Perl's patterns, sometimes > saying that it added Unicode support. That's fairly untrue: 1) Perl's > patterns include full Unicode support, and 2) XML Schema uses a small > subset of them. Um, I _believe_ that the fact is that we took the regexps directly from Unicode -- the REC says: "The regular expression language defined here does not attempt to provide a general solution to "regular expressions" over UCS character sequences. In particular, it does not easily provide for matching sequences of base characters and combining marks. The language is targeted at support of "Level 1" features as defined in _Unicode Regular Expression Guidelines_ [1]. It is hoped that future versions of this specification will provide support for "Level 2" features." ht [1] http://www.unicode.org/unicode/reports/tr18/ -- Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh Half-time member of W3C Team 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/ [mail really from me _always_ has this .sig -- mail without it is forged spam]
Received on Tuesday, 1 July 2003 12:38:28 UTC