[ESW Wiki] Update of "geoUnicodeConsiderationsWhenUpgrading" by Deborah Cawkwell

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "ESW Wiki" for change notification.

The following page has been changed by Deborah Cawkwell:
http://esw.w3.org/topic/geoUnicodeConsiderationsWhenUpgrading


The comment on the change is:
New version with WIKI formatting

------------------------------------------------------------------------------
  These changes should be sent to the GEO public list - providing us with a  consistent notification method and an archive of changes.
  
  ----
- FAQ: Upgrading from language-specific legacy encoding to Unicode encoding 
+ = FAQ: Upgrading from language-specific legacy encoding to Unicode encoding =
- Question: What should I consider when upgrading my web pages from legacy encoding to Unicode encoding?
+ == Question: What should I consider when upgrading my web pages from legacy encoding to Unicode encoding? ==
- Background 
+ === Background ===
  You have heard that using Unicode is a good idea and that there are benefits such as standards compatibility, multilingual display on a single page, pan-organisation applications. 
+ 
  Numerous large organizations are beginning to switch to Unicode: [http://www.w3.org/International/questions/qa-who-uses-unicode FAQ: Who uses Unicode?]
  However, you are not sure what's involved and whether it will work for your site.
+ 
  This FAQ will attempt to list some of the considerations you would need to take into account for the encoding of web pages.
+ 
  Note that if you are using a content management system to generate web pages, you may need to consider your storage encoding, migration of legacy data, software support.
+ 
  [MD 22 mar] Maybe mention here that some mobile phones don't yet support UTF-8 (but some do, although with a limited range of characters).
- Answer 
+ 
+ == Answer == 
+ 
- Which Unicode encoding for web pages? 
+ === Which Unicode encoding for web pages? ===
+  
  Unicode has three main encodings: UTF-8, UTF-16, UTF-32.
+ 
  UTF-8 is the Unicode encoding consistently used for web pages:
+ 
- • Better compatibility with legacy data, where that legacy data uses ASCII as the 128 codepoints in ASCII match the first 128 codepoints in UTF-8.
+    * Better compatibility with legacy data, where that legacy data uses ASCII as the 128 codepoints in ASCII match the first 128 codepoints in UTF-8.
- • No byte order problems as UTF-8 is 8-bit.
+    * No byte order problems as UTF-8 is 8-bit.
+ 
- How well is Unicode supported for my end users?
+ === How well is Unicode supported for my end users? ===
+ 
  This depends on:
- • browser support
+    * browser support
- • suitable fonts
+    * suitable fonts
- • rendering software
+    * rendering software
+ 
- Browser support
+ ==== Browser support ====
+ 
  Modern browsers support Unicode:
+ 
- • Internet Explorer 6 (Windows)
+    * Internet Explorer 6 (Windows)
- • Firefox 1.0 
+    * Firefox 1.0 
- • Mozilla 1.4
+    * Mozilla 1.4
- • Opera 7.0
+    * Opera 7.0
- • Netscape Navigator 7.0
+    * Netscape Navigator 7.0
- • Safari 1.03
+    * Safari 1.03
- • Internet Explorer 5.2 (Mac)
+    * Internet Explorer 5.2 (Mac)
- Suitable fonts
+ 
+ ==== Suitable fonts ====
+ 
  Correct script display requires Unicode support by the operating system and availability on the machine of Unicode fonts. 
+ 
  CSS can help with font family fallbacks in the case where the user does not have a specific font, but another font will display the text readably. Do use CSS generic font family fallbacks, eg, serif, sans-serif.
+ 
  Modern operating systems support Unicode:
+ 
- • Windows NT and its descendants Windows 2000 and Windows XP
+    * Windows NT and its descendants Windows 2000 and Windows XP
- • UNIX-like operating systems such as GNU/Linux
+    * UNIX-like operating systems such as GNU/Linux
- •	BSD
- •	Mac OS X
+    * BSD
+    * Mac OS X
+ 
  Standard installation of an operating system includes suitable fonts for the language selected by the user. Fonts not included in a standard installation can usually be added via menu options; they can also be downloaded. Some languages currently require a font download; these languages include Pashto, Hindi, Urdu, Bengali.
+ 
  Commonly available Unicode fonts (commercial and open source) are [http://en.wikipedia.org/wiki/Truetype TrueType] and the more recent [http://en.wikipedia.org/wiki/Opentype OpenType].
+ 
  Unicode fonts or ‘font families’ provide a mapping from Unicode codepoints to the graphical representation of characters, ie, glyphs. Unicode fonts usually cover [http://www.babelstone.co.uk/Fonts/Fonts.html specific scripts]. Applications such as browsers usually cover Unicode by using several fonts for different scripts and ranges.
+ 
  Font display problems:
+ 
- • ISO-8859-1/windows-XXXX: an operating system or browser either has a font installed for that encoding or it doesn't, therefore either the page displays correctly or no characters display (question marks).
+    * ISO-8859-1/windows-XXXX: an operating system or browser either has a font installed for that encoding or it doesn't, therefore either the page displays correctly or no characters display (question marks).
- • Unicode: the operating system or browser has fonts for some, but not all, of the codepoints, so when displaying a Unicode page, some of the characters may display correctly whilst others don't because the browser has access to fonts for some of the codepoints but not all (empty rectangles).
+    * Unicode: the operating system or browser has fonts for some, but not all, of the codepoints, so when displaying a Unicode page, some of the characters may display correctly whilst others don't because the browser has access to fonts for some of the codepoints but not all (empty rectangles).
+ 
- Rendering software
+ ==== Rendering software ====
+ 
  Multilingual text rendering engines are built into operating system and browser installation.
+ 
- • Windows: Uniscribe
+    * Windows: Uniscribe
- • Macintosh: Apple Type Services for Unicode Imaging, which replaced the WorldScript engine for legacy encodings.
+    * Macintosh: Apple Type Services for Unicode Imaging, which replaced the WorldScript engine for legacy encodings.
- • Pango - open source
+    * Pango - open source
- • Graphite - (open source renderer from SIL)
+    * Graphite - (open source renderer from SIL)
+ 
- What I don’t need to worry about
+ === What I don’t need to worry about ===
- Page weight
+ 
+ ==== Page weight ====
+ 
  Same page weight as for legacy encodings:
+ 
- • HTML markup
+    * HTML markup
- • English
+    * English
+ 
  Slightly heavier
+ 
- • Latin languages: characters, eg, e acute, outside the ASCII range (128 codepoints), are represented by one byte in ISO-8859-1, but typically two bytes in UTF-8, so a small, but acceptable, increase in page size should be expected.
+    * Latin languages: characters, eg, e acute, outside the ASCII range (128 codepoints), are represented by one byte in ISO-8859-1, but typically two bytes in UTF-8, so a small, but acceptable, increase in page size should be expected.
+ 
- • Characters that do not fall into the ASCII range, such as Chinese, Arabic, Russian, may use 2 or even 3 bytes. Chinese encodings already use more than 1 byte per character with legacy encodings, where they use double bytes.
+    * Characters that do not fall into the ASCII range, such as Chinese, Arabic, Russian, may use 2 or even 3 bytes. Chinese encodings already use more than 1 byte per character with legacy encodings, where they use double bytes.
+ 
- Don't forget
+ === Don't forget ===
+ 
- Character encoding declaration 
+ ==== Character encoding declaration ====
+ 
  You should ensure that you change the [http://www.w3.org/International/tutorials/tutorial-char-enc Tutorial: character encoding declaration] from legacy to Unicode. 
- • HTTP header content-type, eg, Content-Type: text/html; charsetutf-8
- • HTML head, eg, <meta http-equiv"Content-Type" content"text/html; charsetutf-8"/>
- Further reading 
-   * [http://www.w3.org/International/questions/qa-who-uses-unicode FAQ: Who uses Unicode?]
-   * [http://www.unicode.org/help/display_problems.html Settings to change to resolve display problems in Unicode]
-   * [http://en.wikipedia.org/wiki/Truetype Information about TrueType font]
-   * [http://en.wikipedia.org/wiki/Opentype Information about OpenType font]
-   * [http://www.babelstone.co.uk/Fonts/Fonts.html Unicode fonts and specific scripts].
-   * [http://www.unicode.org Unicode Consortium]
-   * [http://www.w3.org/International/tutorials/tutorial-char-enc Tutorial: Character sets & encodings in XHTML, HTML and CSS]
-   * [http://www.alanwood.net/unicode/browsers.html Unicode & multilingual web browsers]
-   * [http://en.wikipedia.org/wiki/Unicode_and_HTML Unicode & HTML]
  
+    * HTTP header content-type, eg, Content-Type: text/html; charsetutf-8
+    * HTML head, eg, <meta http-equiv"Content-Type" content"text/html; charsetutf-8"/>
+ 
+ === Further reading === 
+    * [http://www.w3.org/International/questions/qa-who-uses-unicode FAQ: Who uses Unicode?]
+    * [http://www.unicode.org/help/display_problems.html Settings to change to resolve display problems in Unicode]
+    * [http://en.wikipedia.org/wiki/Truetype Information about TrueType font]
+    * [http://en.wikipedia.org/wiki/Opentype Information about OpenType font]
+    * [http://www.babelstone.co.uk/Fonts/Fonts.html Unicode fonts and specific scripts].
+    * [http://www.unicode.org Unicode Consortium]
+    * [http://www.w3.org/International/tutorials/tutorial-char-enc Tutorial: Character sets & encodings in XHTML, HTML and CSS]
+    * [http://www.alanwood.net/unicode/browsers.html Unicode & multilingual web browsers]
+    * [http://en.wikipedia.org/wiki/Unicode_and_HTML Unicode & HTML]
+ 

Received on Tuesday, 3 May 2005 21:46:58 UTC