- From: Tex Texin <tex@i18nguy.com>
- Date: Tue, 17 Jun 2003 00:38:35 -0400
- To: Phil Arko <phil.arko@scr.siemens.com>, GEO <public-i18n-geo@w3.org>
Phil, looks good! 1) I wonder if it is really the case that beginners write for a demographic, or they write and make primarily unconscious choices and assumptions about what their audience knows and understands, usually presuming their audience is like the author, and this is of course wrong internationally. 2) Web pages can easily accomodate languages besides Eng. Ger. and Romance. We should watch for assumptions that the audience is western european. ;-) Perhaps eliminate that reference altogether and just talk about introducing foreign words. 3) It might be good if each paragraph had a topic heading to make it easy for people to identify the subtopic. character encoding and display. reading direction. number formats. date and time formats. Then, we can also add topics to this question easily if each para has its own heading and text. 4) Perhaps the "By the way" would be better as a separate Q&A. We certainly need to cover the terms and abbreviations. Consider creating a "What is internationalization, localization, i18n, l10n?" q&a with essentially the same text. 5) Perhaps add a reference to the already published date-time question from the date-time para. 6) On reading direction- I don't know if I would say "many" languages. It is certainly more than a few, but it is not a substantial percentage of languages. (I think.) Sorry if this seems picky. I am just looking to avoid misleading people about the relative importance of RTL. Also, reading direction is more than text processing, since it affects the overall layout and organization of the pages (graphics, etc. change too). This bears commenting. 7) We might offer a list at the end without further explanation of the other items to consider for a global web site: terminology calendars, work hours, holidays currency, taxes and financial rules graphics and images significance color significance sound signficance measurement systems page and form sizes titles and address formats laws and legal references others? 8) By leaving off HTML in the standards list, are we implying it is non-international? I think this list of standards is not helpful as it is. It leaves off RFC 3066 and perhaps others that are relevant to web i18n. I think it might be a good idea once we have a more complete list. However, I like the idea of providing some ideas for the reader as to where to go to get more information. If you want, I have a list of websites with checklists and relevant info that you can pick up some pages to cite. I would suggest using an entry for each of the major software vendors (microsoft, sun, ibm, oracle, netscape, etc.) as well as some of the tutorial/educational sites (e.g. multilingual webmaster). Look at http://www.i18nguy.com/guidelines.html and pick some you like. Also reference the web i18n tutorial and some of the other links elsewhere on the w3c/international site. I think the question is a good intro to the topic and sets the stage well for the other questions. Hope the comments are helpful. ------------------------------------------------------------------------ Questions & Answers: Initial considerations for international web sites Question What are some topics to consider when creating websites for an international audience? Background People from around the world can view your content on websites. Because much of what we find on the web is written with a specific demographic in mind, it is often the case that people outside of that demographic misunderstand what has actually been intended. The formatting and presentation of text has very specific regional and cultural requirements that need to be addressed if the content is to be properly understood. Answer A typical challenge is to ensure that characters display correctly for the end user. Web pages can easily accommodate English, Germanic, and Romance languages, but what happens when an occasional foreign word or name is used? In the past, a quick solution was to use an inline graphic to display the character. Another method was to copy and paste the desired character from another program into the web page. While the result might look correct for one user, there is no guarantee that every user will see the same text. There are many variables that might need to be considered, such as the font, operating system, browser software, etc. These concerns are becoming increasingly important as users move toward mobile and other non-standard browsing devices. As many languages read from right to left, the ability to include such content becomes an even greater challenge. In addition to identifying the proper characters, there also needs to be a method of properly handling this text. Some cultures use a comma as a thousands separator and a period as a decimal point, while other cultures use the period and comma, respectively. For example, 1,547 in Germany and 1.547 in the United States are actually the same number. While the only difference in this example is a single character, the difference in meaning is significant. The presentation of dates and times are a very typical example of something that causes confusion for the user. When using two digits each to represent year, month, and day, the actual date might not be obvious. A few examples from different cultures include DD/MM/YY, MM/DD/YY, and YY/MM/DD. A single date in the format "xx/xx/xx" could be interpreted as three different dates. There are many other concerns that should be addressed as well when creating an international-friendly site. This is only a sampling of some of these. By the way... In its simplest definition, "internationalization" refers to creating a site framework that allows for content to be presented in a way that is consistent with regional styles and cultural customs. "Localization" refers to the actual implementation of each specific region's content into the international framework. Internationalization is commonly referred to as "i18n" because there are 18 characters between the beginning "i" and concluding "n." Similarly, localization is commonly referred to as "l10n." When starting to create an internationalized site, one must first give consideration to the various locales that need to be considered. This will help to define the requirements for the international framework. It is highly recommended to work with native speaking people who are very familiar with the regions and cultures that are part of your user demographic. Most importantly, the end user must understand that a page has been localized. It is a good practice to indicate or imply that the content has been formatted for their local formats. This avoids questions and possible misinterpretations. Further information This Q&A provides only a few introductory points on this topic. There are many books devoted to the topics of internationalization and localization. Becoming familiar with the styles and customs of other regions and properly implementing these elements into a web site will ensure that content is available to -- and truly understandable by -- a larger audience. Some of the standards typically used to create internationalized web sites include the following: - XML [ www.w3.org/XML ] is the preferred markup language for defining content. In addition to identifying the actual content, it can also include attributes that further define aspects of the content (such as language, grammar style, and current format of the content). Other web languages (such as XHTML) use these attributes to deliver the localized page appropriate for the current user. - XHTML [ www.w3.org/MarkUp ] is the successor to HTML, and is a markup language used to define web pages and to properly format and display XML content within them. - Unicode [ www.unicode.org ] is a numbered collection of the characters of all of the languages in the world. Using this standard ensures that the correct character will be displayed, regardless of the browser or system. Properly utilizing these standards in a web site can ensure that the concerns mentioned above are properly handled.
Received on Tuesday, 17 June 2003 00:39:28 UTC