W3C home > Mailing lists > Public > www-international@w3.org > April to June 2000

Re: JPython i18n question

From: Martin J. Duerst <duerst@w3.org>
Date: Fri, 28 Apr 2000 15:35:02 +0900
Message-Id: <4.2.0.58.J.20000428153424.036f52f0@sh.w3.mag.keio.ac.jp>
To: Greg Hill <GHill@intl.com>, "'www-international@w3.org'" <www-international@w3.org>
Hello Greg,

This is a list dedicated to the internationalization of the WWW.
I'm not sure your question fits here very well.

Regards,   Martin.

At 00/04/27 12:58 -0600, Greg Hill wrote:
>I asked this to the jpython interest group, but thought it might be asked
>here too,
>since it's an i18n question:
>
>
> > Hello JPython developers....
> >
> > I'm trying to use JPython's (1.5.2, on jdk1.1.2) locale-dependent regular
> > expression capability (the L flag in the re module, used in conjunction
> > with \w or \W in an expression). The problem seems to be setting the
> > locale. When I install JPython, including installing all the libraries in
> > pylib152e.jar, I don't get the locale module, hence no setLocale call. But
> > I need to be able to change the locale, i.e. accepting the system default
> > isn't an option. So I tried subclassing a locale-aware Java class,
> > MessageFormat in java.text. Then I can do setLocale, and compile an re
> > with the re.L flag set, but I see no locale sensitive behavior when I try
> > to match a multibyte string a character at a time using \w and \W. For
> > example, if I execute the following commands from the interpreter using
> > the code at the end of this message,
> >
> > import PyTest
> > x=PyTest.PyTest()
> > x.set("ja","JP")
> > x.test()
> > x.findDelim()
> >
> > and (as is seen in the code), the test string for the match is
> > "?@?A?B?C?D" (a cut and paste of multibyte Japanese jis0208-1990 2-byte
> > chars, the %'s are
> > actuall \211 in octal),
> >
> > the first \w matches 'A' and the first \W matches the first '%' (actually
> > \211 octal).
> > In fact, neither \w nor \W  should match a single byte (the test string
> > has 5 double-byte
> > chars, '%@', '%A', etc.). Here's the actual output (matching \W in the
> > first group):
> >
> > >>> x.findDelim()
> > ('\211', '@\211A\211B\211C\211D')
> >
> > Here's the code:
> >
> > from java.text import MessageFormat
> > from java.util import Locale
> > from java.lang import String
> > import re
> >
> > class PyTest( MessageFormat ):
> >   def __init__(self):
> >     "@sig public PyTest()"
> >     jstr = String("{0} {0} {0}")
> >     MessageFormat.__init__(self,jstr)
> >
> >   def get(self):
> >     "@sig public java.lang.String get()"
> >     return self.getLocale()
> >
> >   def set(self, str, str2 ):
> >     "@sig public get( java.lang.String )"
> >     jstr = String(str)
> >     jstr2 = String(str2)
> >     loc = Locale(jstr, jstr2)
> >     self.setLocale( loc )
> >
> >   def test(self):
> >     "@sig public test()"
> >     self.delim = r"(?P<ch>\W)(?P<rem>.*)$"
> >     print self.delim
> >     self._delim = re.compile( self.delim, re.L )
> >     self._data = "?@?A?B?C?D"
> >
> >   def findDelim(self):
> >     "@sig public java.lang.String findDelim()"
> >     self.m = self._delim.search( self._data )
> >     print self.m.groups()
> >
> >
> >
Received on Friday, 28 April 2000 02:56:39 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:55 GMT