Q&A: Initial considerations for international web sites

Below is the revised Q&A

I have taken out references to codes and markup languages in the main
sections of this Q&A. Because this is meant to act somewhat like an
introduction to our area, I felt that it was important to include some
mention of these in order to provide the reader with suggested next steps
(those being to learn a little more about each of the standards mentioned).
I discussed them briefly under "Further information."

Thanks,
Phil


----------------------------------------------------------------------------
-
Questions & Answers:  Initial considerations for international web sites


Question

What are some topics to consider when creating websites for an international
audience?


Background

People from around the world can view your content on websites. Because much
of what we find on the web is written with a specific demographic in mind,
it is often the case that people outside of that demographic misunderstand
what has actually been intended. The formatting and presentation of text has
very specific regional and cultural requirements that need to be addressed
if the content is to be properly understood.


Answer

A typical challenge is to ensure that characters display correctly for the
end user. Web pages can easily accommodate English, Germanic, and Romance
languages, but what happens when an occasional foreign word or name is used?
In the past, a quick solution was to use an inline graphic to display the
character. Another method was to copy and paste the desired character from
another program into the web page. While the result might look correct for
one user, there is no guarantee that every user will see the same text.
There are many variables that might need to be considered, such as the font,
operating system, browser software, etc. These concerns are becoming
increasingly important as users move toward mobile and other non-standard
browsing devices.

As many languages read from right to left, the ability to include such
content becomes an even greater challenge. In addition to identifying the
proper characters, there also needs to be a method of properly handling this
text.

Some cultures use a comma as a thousands separator and a period as a decimal
point, while other cultures use the period and comma, respectively. For
example, 1,547 in Germany and 1.547 in the United States are actually the
same number. While the only difference in this example is a single
character, the difference in meaning is significant.

The presentation of dates and times are a very typical example of something
that causes confusion for the user. When using two digits each to represent
year, month, and day, the actual date might not be obvious. A few examples
from different cultures include DD/MM/YY, MM/DD/YY, and YY/MM/DD. A single
date in the format "xx/xx/xx" could be interpreted as three different dates.

There are many other concerns that should be addressed as well when creating
an international-friendly site. This is only a sampling of some of these.


By the way...

In its simplest definition, "internationalization" refers to creating a site
framework that allows for content to be presented in a way that is
consistent with regional styles and cultural customs. "Localization" refers
to the actual implementation of each specific region's content into the
international framework. Internationalization is commonly referred to as
"i18n" because there are 18 characters between the beginning "i" and
concluding "n." Similarly, localization is commonly referred to as "l10n."

When starting to create an internationalized site, one must first give
consideration to the various locales that need to be considered. This will
help to define the requirements for the international framework. It is
highly recommended to work with native speaking people who are very familiar
with the regions and cultures that are part of your user demographic.

Most importantly, the end user must understand that a page has been
localized. It is a good practice to indicate or imply that the content has
been formatted for their local formats. This avoids questions and possible
misinterpretations.


Further information

This Q&A provides only a few introductory points on this topic. There are
many books devoted to the topics of internationalization and localization.
Becoming familiar with the styles and customs of other regions and properly
implementing these elements into a web site will ensure that content is
available to -- and truly understandable by -- a larger audience. 

Some of the standards typically used to create internationalized web sites
include the following:

- XML [ www.w3.org/XML ] is the preferred markup language for defining
content. In addition to identifying the actual content, it can also include
attributes that further define aspects of the content (such as language,
grammar style, and current format of the content). Other web languages (such
as XHTML) use these attributes to deliver the localized page appropriate for
the current user.

- XHTML [ www.w3.org/MarkUp ] is the successor to HTML, and is a markup
language used to define web pages and  to properly format and display XML
content within them.

- Unicode [ www.unicode.org ] is a numbered collection of the characters of
all of the languages in the world. Using this standard ensures that the
correct character will be displayed, regardless of the browser or system.

Properly utilizing these standards in a web site can ensure that the
concerns mentioned above are properly handled.

Received on Wednesday, 11 June 2003 19:47:34 UTC