W3C home > Mailing lists > Public > www-international@w3.org > January to March 2006

Re: New article published: Introducing Character Sets and Encodings

From: Jeremy Carroll <jjc@hpl.hp.com>
Date: Mon, 23 Jan 2006 12:28:36 +0000
Message-ID: <43D4CBF4.7040309@hpl.hp.com>
To: Richard Ishida <ishida@w3.org>
CC: www-international@w3.org


When I first hit the encodings issue me and many of my colleagues, and 
the users of our software were I think more ignorant than this article 
seems to presuppose.

I think a key part of our problem was understanding that our own compute 
platforms use an encoding, that is generally transparent; but that on 
the Web we are communicating with systems using different encodings and 
so the transparency we generally expect on our own platforms ceases to work.

I think I posted a few months ago about how we experienced this in terms 
of the Java FileReader class. Initially we were recommending our users 
to use this for Web data; in the second release of our software, we 
changed to anti-recommend the use of Readers (which tend to use the 
platform default encoding, transcoded into UTF-16) and instead recommend 
InputStreams (bytes), so that our software could address the encoding 
issues correctly, in accordance with the relevant standards.

In summary, I think this could be improved by having part of the 
introductory material address the "encodings on the Web" vs "encoding on 
my local platform issue."

Jeremy

Richard Ishida wrote:
> 
> 
> 
> The GEO Working Group has published the article:
> 
> 	Introducing Character Sets and Encodings
> 	http://www.w3.org/International/getting-started/characters
> 	By Richard Ishida
> 
> 
> 
> This is the first in a series of articles aimed at those who are new to internationalization. These pages will introduce people to key internationalization topics and tasks, and direct them towards articles or resources on the W3C Internationalization subsite that will take them to the next level of understanding.
> 
> This document introduces topics in the general area of character sets, encodings, escapes, etc.
> 
> The document is linked from a new 'Getting Started' page that also explains various ways to find information on the W3C Internationalization subsite, and points to some key definitions.
> 
> The "Getting Started" page and these "Introducing..." pages aim to provide newcomers with a gentle pathway into the many and varied resources on the site, rather than expecting them to work out for themselves how to get an overview of the topic and decide which of the resources to read first. These pages do not go through the review stages typical of technical articles, and will be modified and improved over the weeks following their publication.
> 
> 
> 
> You can find various news filters and RSS feeds relating to the work of the Internationalization Activity at http://www.w3.org/International/log/description
> 
> 
> 
> ============
> Richard Ishida
> Internationalization Lead
> W3C (World Wide Web Consortium)
> 
> http://www.w3.org/People/Ishida/
> http://www.w3.org/International/
> http://people.w3.org/rishida/blog/
> http://www.flickr.com/photos/ishida/
> 
> 
> 
Received on Monday, 23 January 2006 12:30:04 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:06 GMT