- From: Greg Hill <GHill@intl.com>
- Date: Thu, 27 Apr 2000 12:58:35 -0600
- To: "'www-international@w3.org'" <www-international@w3.org>
I asked this to the jpython interest group, but thought it might be asked
here too,
since it's an i18n question:
> Hello JPython developers....
>
> I'm trying to use JPython's (1.5.2, on jdk1.1.2) locale-dependent regular
> expression capability (the L flag in the re module, used in conjunction
> with \w or \W in an expression). The problem seems to be setting the
> locale. When I install JPython, including installing all the libraries in
> pylib152e.jar, I don't get the locale module, hence no setLocale call. But
> I need to be able to change the locale, i.e. accepting the system default
> isn't an option. So I tried subclassing a locale-aware Java class,
> MessageFormat in java.text. Then I can do setLocale, and compile an re
> with the re.L flag set, but I see no locale sensitive behavior when I try
> to match a multibyte string a character at a time using \w and \W. For
> example, if I execute the following commands from the interpreter using
> the code at the end of this message,
>
> import PyTest
> x=PyTest.PyTest()
> x.set("ja","JP")
> x.test()
> x.findDelim()
>
> and (as is seen in the code), the test string for the match is
> "?@?A?B?C?D" (a cut and paste of multibyte Japanese jis0208-1990 2-byte
> chars, the %'s are
> actuall \211 in octal),
>
> the first \w matches 'A' and the first \W matches the first '%' (actually
> \211 octal).
> In fact, neither \w nor \W should match a single byte (the test string
> has 5 double-byte
> chars, '%@', '%A', etc.). Here's the actual output (matching \W in the
> first group):
>
> >>> x.findDelim()
> ('\211', '@\211A\211B\211C\211D')
>
> Here's the code:
>
> from java.text import MessageFormat
> from java.util import Locale
> from java.lang import String
> import re
>
> class PyTest( MessageFormat ):
> def __init__(self):
> "@sig public PyTest()"
> jstr = String("{0} {0} {0}")
> MessageFormat.__init__(self,jstr)
>
> def get(self):
> "@sig public java.lang.String get()"
> return self.getLocale()
>
> def set(self, str, str2 ):
> "@sig public get( java.lang.String )"
> jstr = String(str)
> jstr2 = String(str2)
> loc = Locale(jstr, jstr2)
> self.setLocale( loc )
>
> def test(self):
> "@sig public test()"
> self.delim = r"(?P<ch>\W)(?P<rem>.*)$"
> print self.delim
> self._delim = re.compile( self.delim, re.L )
> self._data = "?@?A?B?C?D"
>
> def findDelim(self):
> "@sig public java.lang.String findDelim()"
> self.m = self._delim.search( self._data )
> print self.m.groups()
>
>
>
Received on Thursday, 27 April 2000 14:59:01 UTC