Re: schema pattern matching (negate) from Robin Berjon on 2003-07-02 (xmlschema-dev@w3.org from July 2003)

From: Robin Berjon <robin.berjon@expway.fr>
Date: Wed, 02 Jul 2003 18:18:14 +0200
To: "Henry S. Thompson" <ht@cogsci.ed.ac.uk>
Cc: Jeni Tennison <jeni@jenitennison.com>, xmlschema-dev@w3.org
Message-ID: <3F0305C6.3080509@expway.fr>

Henry S. Thompson wrote:
> Robin Berjon <robin.berjon@expway.fr> writes:
>>Similarly, I couldn't find anything in the spec to control
>>case-sensitivity. Did I miss it or has it been overlooked? Without it
>>it is a true pain matching case-insensitive values (barbaz becoming
>>[bB][aA][rR][bB][aA][rR]).
> 
> Case insensitivity is somewhere between very difficult and incoherent
> for Unicode, as I understand it.  Different languages have different
> opinions about what the uppercase/lowercase correspondences are,
> e.g. (again, allegedly -- I'm not a writing system expert) the
> upper-case of Montréal, Canada is MONTREAL, but the upper case of
> Montréal, France is MONTRÉAL.

Case insensitivity is certainly difficult, however Unicode seems to have defined 
a behaviour, which XSLT/XPath/XQuery have apparently adopted:

   http://www.w3.org/TR/xpath-functions/#func-upper-case
   http://www.w3.org/TR/xpath-functions/#func-lower-case
   http://www.unicode.org/unicode/reports/tr21/

>>While on this topic, I'd like to point out that a lot of literature
>>out there states that XML Schema borrowed Perl's patterns, sometimes
>>saying that it added Unicode support. That's fairly untrue: 1) Perl's
>>patterns include full Unicode support, and 2) XML Schema uses a small
>>subset of them.
> 
> Um, I _believe_ that the fact is that we took the regexps directly
> from Unicode -- the REC says:

That was my understanding as well, thanks for clarifying. I wish the stuff that 
can be read on XML Schema were more precise (not that it's easy but still).

-- 
Robin Berjon <robin.berjon@expway.fr>
Research Engineer, Expway        http://expway.fr/
7FC0 6F5F D864 EFB8 08CE  8E74 58E6 D5DB 4889 2488

Received on Wednesday, 2 July 2003 12:19:04 UTC