Re: Introductions

Dear All,


Hi, I'm Lucy Park, a data analyst at Team POPONG (http://en.popong.com) --
a nonprofit, nonpartisan group at Seoul, which aims to provide data in open
formats for Korean legislative data.

During the past few months, we've collected data from various government
sources (http://en.popong.com/sources <http://popong.com/sources>), and
currently obtain data for approx. 12,000 politicians (candidates of
elections for the past 60 years), and 46,000 bill texts of 20 years' worth
and much more PDF documents (that should probably be OCR-ed).
We had several difficulties on the way, including:

1. Irregular structures and dispersion of data and in government websites.
2. Machine *un*readable formats: PDF, HWP (a format created by a software
named "Hangul", which is extensively used by the government), ...
3. Different people with the same names: Most Korean names consist of only
three syllables, and share family names.
4. Multilingual texts: Korean bills texts are a mixture of Korean, English
and Chinese characters.
5. Encodings: Encodings should be detected and converted to Unicode because
otherwise Korean characters cannot be read in many cases.
6. Internationalization (i18n): Since Korean in unreadable to many people
outside the country.

We've tackled or are used to many of these problems (1, 3, 4, 5), but still
are having a hard time with some others (6, and especially 2).
My team would like to communicate with other organizations and individuals
regarding such topics at anytime.

We've also opened a website ten days ago containing information based on
the data above, named "Pokr" (http://en.pokr.kr/). (Sources:
http://github.com/teampopong/pokr)
While we continuously work on improving the service, we would also like to
provide the raw data to the public.

This is why we are especially interested in global standards for opening
government data.
My team was pointed to Popolo and this mailing list during a chat with
Sunlight Foundation, and soon became awed by this project.
We hope to exchange many experiences with you, and are interested in
contributing to build a robust, global-wide government data specification.


Thanks,
Lucy Park

Received on Tuesday, 2 July 2013 06:45:51 UTC