- From: Greg Hill <GHill@intl.com>
- Date: Thu, 27 Apr 2000 12:58:35 -0600
- To: "'www-international@w3.org'" <www-international@w3.org>
I asked this to the jpython interest group, but thought it might be asked here too, since it's an i18n question: > Hello JPython developers.... > > I'm trying to use JPython's (1.5.2, on jdk1.1.2) locale-dependent regular > expression capability (the L flag in the re module, used in conjunction > with \w or \W in an expression). The problem seems to be setting the > locale. When I install JPython, including installing all the libraries in > pylib152e.jar, I don't get the locale module, hence no setLocale call. But > I need to be able to change the locale, i.e. accepting the system default > isn't an option. So I tried subclassing a locale-aware Java class, > MessageFormat in java.text. Then I can do setLocale, and compile an re > with the re.L flag set, but I see no locale sensitive behavior when I try > to match a multibyte string a character at a time using \w and \W. For > example, if I execute the following commands from the interpreter using > the code at the end of this message, > > import PyTest > x=PyTest.PyTest() > x.set("ja","JP") > x.test() > x.findDelim() > > and (as is seen in the code), the test string for the match is > "?@?A?B?C?D" (a cut and paste of multibyte Japanese jis0208-1990 2-byte > chars, the %'s are > actuall \211 in octal), > > the first \w matches 'A' and the first \W matches the first '%' (actually > \211 octal). > In fact, neither \w nor \W should match a single byte (the test string > has 5 double-byte > chars, '%@', '%A', etc.). Here's the actual output (matching \W in the > first group): > > >>> x.findDelim() > ('\211', '@\211A\211B\211C\211D') > > Here's the code: > > from java.text import MessageFormat > from java.util import Locale > from java.lang import String > import re > > class PyTest( MessageFormat ): > def __init__(self): > "@sig public PyTest()" > jstr = String("{0} {0} {0}") > MessageFormat.__init__(self,jstr) > > def get(self): > "@sig public java.lang.String get()" > return self.getLocale() > > def set(self, str, str2 ): > "@sig public get( java.lang.String )" > jstr = String(str) > jstr2 = String(str2) > loc = Locale(jstr, jstr2) > self.setLocale( loc ) > > def test(self): > "@sig public test()" > self.delim = r"(?P<ch>\W)(?P<rem>.*)$" > print self.delim > self._delim = re.compile( self.delim, re.L ) > self._data = "?@?A?B?C?D" > > def findDelim(self): > "@sig public java.lang.String findDelim()" > self.m = self._delim.search( self._data ) > print self.m.groups() > > >
Received on Thursday, 27 April 2000 14:59:01 UTC