W3C home > Mailing lists > Public > public-i18n-geo@w3.org > June 2003

[Fwd: Q&A: Initial considerations for international web sites]

From: Tex Texin <tex@i18nguy.com>
Date: Tue, 17 Jun 2003 00:38:35 -0400
Message-ID: <3EEE9B4B.10B575EF@I18nGuy.com>
To: Phil Arko <phil.arko@scr.siemens.com>, GEO <public-i18n-geo@w3.org>

Phil, looks good!

1) I wonder if it is really the case that beginners write for a demographic,
or they write and make primarily unconscious choices and assumptions about
what their audience knows and understands, usually presuming their audience is
like the author, and this is of course wrong internationally.

2) Web pages can easily accomodate languages besides Eng. Ger. and Romance. We
should watch for assumptions that the audience is western european. ;-)
Perhaps eliminate that reference altogether and just talk about introducing
foreign words.

3) It might be good if each paragraph had a topic heading to make it easy for
people to identify the subtopic.
character encoding and display.
reading direction.
number formats.
date and time formats.

Then, we can also add topics to this question easily if each para has its own
heading and text.

4) Perhaps the "By the way" would be better as a separate Q&A. We certainly
need to cover the terms and abbreviations.
Consider creating a "What is internationalization, localization, i18n, l10n?"
q&a with essentially the same text.

5) Perhaps add a reference to the already published date-time question from
the date-time para.

6) On reading direction- I don't know if I would say "many" languages. It is
certainly more than a few, but it is not a substantial percentage of
languages. (I think.) Sorry if this seems picky. I am just looking to avoid
misleading people about the relative importance of RTL.

Also, reading direction is more than text processing, since it affects the
overall layout and organization of the pages (graphics, etc. change too). This
bears commenting.

7) We might offer a list at the end without further explanation of the other
items to consider for a global web site:
calendars, work hours, holidays
currency, taxes and financial rules
graphics and images significance
color significance
sound signficance
measurement systems
page and form sizes
titles and address formats
laws and legal references

8) By leaving off HTML in the standards list, are we implying it is
I think this list of standards is not helpful as it is. It leaves off RFC 3066
and perhaps others that are relevant to web i18n.
I think it might be a good idea once we have a more complete list.

However, I like the idea of providing some ideas for the reader as to where to
go to get more information.
If you want, I have a list of websites with checklists and relevant info that
you can pick up some pages to cite.
I would suggest using an entry for each of the major software vendors
(microsoft, sun, ibm, oracle, netscape, etc.) as well as some of the
tutorial/educational sites (e.g. multilingual webmaster).
Look at http://www.i18nguy.com/guidelines.html and pick some you like.
Also reference the web i18n tutorial and some of the other links elsewhere on
the w3c/international site.

I think the question is a good intro to the topic and sets the stage well for
the other questions.
Hope the comments are helpful.


Questions & Answers:  Initial considerations for international web sites


What are some topics to consider when creating websites for an international


People from around the world can view your content on websites. Because much
of what we find on the web is written with a specific demographic in mind,
it is often the case that people outside of that demographic misunderstand
what has actually been intended. The formatting and presentation of text has
very specific regional and cultural requirements that need to be addressed
if the content is to be properly understood.


A typical challenge is to ensure that characters display correctly for the
end user. Web pages can easily accommodate English, Germanic, and Romance
languages, but what happens when an occasional foreign word or name is used?
In the past, a quick solution was to use an inline graphic to display the
character. Another method was to copy and paste the desired character from
another program into the web page. While the result might look correct for
one user, there is no guarantee that every user will see the same text.
There are many variables that might need to be considered, such as the font,
operating system, browser software, etc. These concerns are becoming
increasingly important as users move toward mobile and other non-standard
browsing devices.

As many languages read from right to left, the ability to include such
content becomes an even greater challenge. In addition to identifying the
proper characters, there also needs to be a method of properly handling this

Some cultures use a comma as a thousands separator and a period as a decimal
point, while other cultures use the period and comma, respectively. For
example, 1,547 in Germany and 1.547 in the United States are actually the
same number. While the only difference in this example is a single
character, the difference in meaning is significant.

The presentation of dates and times are a very typical example of something
that causes confusion for the user. When using two digits each to represent
year, month, and day, the actual date might not be obvious. A few examples
from different cultures include DD/MM/YY, MM/DD/YY, and YY/MM/DD. A single
date in the format "xx/xx/xx" could be interpreted as three different dates.

There are many other concerns that should be addressed as well when creating
an international-friendly site. This is only a sampling of some of these.

By the way...

In its simplest definition, "internationalization" refers to creating a site
framework that allows for content to be presented in a way that is
consistent with regional styles and cultural customs. "Localization" refers
to the actual implementation of each specific region's content into the
international framework. Internationalization is commonly referred to as
"i18n" because there are 18 characters between the beginning "i" and
concluding "n." Similarly, localization is commonly referred to as "l10n."

When starting to create an internationalized site, one must first give
consideration to the various locales that need to be considered. This will
help to define the requirements for the international framework. It is
highly recommended to work with native speaking people who are very familiar
with the regions and cultures that are part of your user demographic.

Most importantly, the end user must understand that a page has been
localized. It is a good practice to indicate or imply that the content has
been formatted for their local formats. This avoids questions and possible

Further information

This Q&A provides only a few introductory points on this topic. There are
many books devoted to the topics of internationalization and localization.
Becoming familiar with the styles and customs of other regions and properly
implementing these elements into a web site will ensure that content is
available to -- and truly understandable by -- a larger audience. 

Some of the standards typically used to create internationalized web sites
include the following:

- XML [ www.w3.org/XML ] is the preferred markup language for defining
content. In addition to identifying the actual content, it can also include
attributes that further define aspects of the content (such as language,
grammar style, and current format of the content). Other web languages (such
as XHTML) use these attributes to deliver the localized page appropriate for
the current user.

- XHTML [ www.w3.org/MarkUp ] is the successor to HTML, and is a markup
language used to define web pages and  to properly format and display XML
content within them.

- Unicode [ www.unicode.org ] is a numbered collection of the characters of
all of the languages in the world. Using this standard ensures that the
correct character will be displayed, regardless of the browser or system.

Properly utilizing these standards in a web site can ensure that the
concerns mentioned above are properly handled.
Received on Tuesday, 17 June 2003 00:39:28 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:12:37 GMT