Re: schema pattern matching (negate)

Robin Berjon <robin.berjon@expway.fr> writes:

<snip/>

> Similarly, I couldn't find anything in the spec to control
> case-sensitivity. Did I miss it or has it been overlooked? Without it
> it is a true pain matching case-insensitive values (barbaz becoming
> [bB][aA][rR][bB][aA][rR]).

Case insensitivity is somewhere between very difficult and incoherent
for Unicode, as I understand it.  Different languages have different
opinions about what the uppercase/lowercase correspondences are,
e.g. (again, allegedly -- I'm not a writing system expert) the
upper-case of Montréal, Canada is MONTREAL, but the upper case of
Montréal, France is MONTRÉAL.

> While on this topic, I'd like to point out that a lot of literature
> out there states that XML Schema borrowed Perl's patterns, sometimes
> saying that it added Unicode support. That's fairly untrue: 1) Perl's
> patterns include full Unicode support, and 2) XML Schema uses a small
> subset of them.

Um, I _believe_ that the fact is that we took the regexps directly
from Unicode -- the REC says:

  "The regular expression language defined here does not attempt to
  provide a general solution to "regular expressions" over UCS
  character sequences. In particular, it does not easily provide for
  matching sequences of base characters and combining marks. The
  language is targeted at support of "Level 1" features as defined in
  _Unicode Regular Expression Guidelines_ [1]. It is hoped that future
  versions of this specification will provide support for "Level 2"
  features."

ht

[1] http://www.unicode.org/unicode/reports/tr18/
-- 
  Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                      Half-time member of W3C Team
     2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
	    Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
		     URL: http://www.ltg.ed.ac.uk/~ht/
 [mail really from me _always_ has this .sig -- mail without it is forged spam]

Received on Tuesday, 1 July 2003 12:38:28 UTC