- From: Martin J. Duerst <duerst@w3.org>
- Date: Fri, 28 Apr 2000 15:35:02 +0900
- To: Greg Hill <GHill@intl.com>, "'www-international@w3.org'" <www-international@w3.org>
Hello Greg, This is a list dedicated to the internationalization of the WWW. I'm not sure your question fits here very well. Regards, Martin. At 00/04/27 12:58 -0600, Greg Hill wrote: >I asked this to the jpython interest group, but thought it might be asked >here too, >since it's an i18n question: > > > > Hello JPython developers.... > > > > I'm trying to use JPython's (1.5.2, on jdk1.1.2) locale-dependent regular > > expression capability (the L flag in the re module, used in conjunction > > with \w or \W in an expression). The problem seems to be setting the > > locale. When I install JPython, including installing all the libraries in > > pylib152e.jar, I don't get the locale module, hence no setLocale call. But > > I need to be able to change the locale, i.e. accepting the system default > > isn't an option. So I tried subclassing a locale-aware Java class, > > MessageFormat in java.text. Then I can do setLocale, and compile an re > > with the re.L flag set, but I see no locale sensitive behavior when I try > > to match a multibyte string a character at a time using \w and \W. For > > example, if I execute the following commands from the interpreter using > > the code at the end of this message, > > > > import PyTest > > x=PyTest.PyTest() > > x.set("ja","JP") > > x.test() > > x.findDelim() > > > > and (as is seen in the code), the test string for the match is > > "?@?A?B?C?D" (a cut and paste of multibyte Japanese jis0208-1990 2-byte > > chars, the %'s are > > actuall \211 in octal), > > > > the first \w matches 'A' and the first \W matches the first '%' (actually > > \211 octal). > > In fact, neither \w nor \W should match a single byte (the test string > > has 5 double-byte > > chars, '%@', '%A', etc.). Here's the actual output (matching \W in the > > first group): > > > > >>> x.findDelim() > > ('\211', '@\211A\211B\211C\211D') > > > > Here's the code: > > > > from java.text import MessageFormat > > from java.util import Locale > > from java.lang import String > > import re > > > > class PyTest( MessageFormat ): > > def __init__(self): > > "@sig public PyTest()" > > jstr = String("{0} {0} {0}") > > MessageFormat.__init__(self,jstr) > > > > def get(self): > > "@sig public java.lang.String get()" > > return self.getLocale() > > > > def set(self, str, str2 ): > > "@sig public get( java.lang.String )" > > jstr = String(str) > > jstr2 = String(str2) > > loc = Locale(jstr, jstr2) > > self.setLocale( loc ) > > > > def test(self): > > "@sig public test()" > > self.delim = r"(?P<ch>\W)(?P<rem>.*)$" > > print self.delim > > self._delim = re.compile( self.delim, re.L ) > > self._data = "?@?A?B?C?D" > > > > def findDelim(self): > > "@sig public java.lang.String findDelim()" > > self.m = self._delim.search( self._data ) > > print self.m.groups() > > > > > >
Received on Friday, 28 April 2000 02:56:39 UTC