W3C home > Mailing lists > Public > www-international@w3.org > April to June 2000

JPython i18n question

From: Greg Hill <GHill@intl.com>
Date: Thu, 27 Apr 2000 12:58:35 -0600
Message-ID: <19B8452E57A0D111B04100805F6F312C03E3493F@smtp.bou.intl.com>
To: "'www-international@w3.org'" <www-international@w3.org>
I asked this to the jpython interest group, but thought it might be asked
here too,
since it's an i18n question:


> Hello JPython developers....
> 
> I'm trying to use JPython's (1.5.2, on jdk1.1.2) locale-dependent regular
> expression capability (the L flag in the re module, used in conjunction
> with \w or \W in an expression). The problem seems to be setting the
> locale. When I install JPython, including installing all the libraries in
> pylib152e.jar, I don't get the locale module, hence no setLocale call. But
> I need to be able to change the locale, i.e. accepting the system default
> isn't an option. So I tried subclassing a locale-aware Java class,
> MessageFormat in java.text. Then I can do setLocale, and compile an re
> with the re.L flag set, but I see no locale sensitive behavior when I try
> to match a multibyte string a character at a time using \w and \W. For
> example, if I execute the following commands from the interpreter using
> the code at the end of this message,
> 
> import PyTest
> x=PyTest.PyTest()
> x.set("ja","JP")
> x.test()
> x.findDelim()
> 
> and (as is seen in the code), the test string for the match is
> "?@?A?B?C?D" (a cut and paste of multibyte Japanese jis0208-1990 2-byte
> chars, the %'s are
> actuall \211 in octal),
> 
> the first \w matches 'A' and the first \W matches the first '%' (actually
> \211 octal).
> In fact, neither \w nor \W  should match a single byte (the test string
> has 5 double-byte
> chars, '%@', '%A', etc.). Here's the actual output (matching \W in the
> first group):
> 
> >>> x.findDelim()
> ('\211', '@\211A\211B\211C\211D')
> 
> Here's the code:
> 
> from java.text import MessageFormat
> from java.util import Locale
> from java.lang import String
> import re
> 
> class PyTest( MessageFormat ):
>   def __init__(self):
>     "@sig public PyTest()"
>     jstr = String("{0} {0} {0}")
>     MessageFormat.__init__(self,jstr)
>     
>   def get(self):
>     "@sig public java.lang.String get()"
>     return self.getLocale() 
> 
>   def set(self, str, str2 ):
>     "@sig public get( java.lang.String )"
>     jstr = String(str)
>     jstr2 = String(str2)
>     loc = Locale(jstr, jstr2)
>     self.setLocale( loc )
> 
>   def test(self):
>     "@sig public test()"
>     self.delim = r"(?P<ch>\W)(?P<rem>.*)$"
>     print self.delim
>     self._delim = re.compile( self.delim, re.L )
>     self._data = "?@?A?B?C?D"
> 
>   def findDelim(self):
>     "@sig public java.lang.String findDelim()"
>     self.m = self._delim.search( self._data )
>     print self.m.groups()
>     
> 
> 
Received on Thursday, 27 April 2000 14:59:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:55 GMT