[ESW Wiki] Update of "geoUnicodeConsiderationsWhenUpgrading" by Deborah Cawkwell

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "ESW Wiki" for change notification.

The following page has been changed by Deborah Cawkwell:

The comment on the change is:
New version with WIKI formatting

  These changes should be sent to the GEO public list - providing us with a  consistent notification method and an archive of changes.
- FAQ: Upgrading from language-specific legacy encoding to Unicode encoding 
+ = FAQ: Upgrading from language-specific legacy encoding to Unicode encoding =
- Question: What should I consider when upgrading my web pages from legacy encoding to Unicode encoding?
+ == Question: What should I consider when upgrading my web pages from legacy encoding to Unicode encoding? ==
- Background 
+ === Background ===
  You have heard that using Unicode is a good idea and that there are benefits such as standards compatibility, multilingual display on a single page, pan-organisation applications. 
  Numerous large organizations are beginning to switch to Unicode: [http://www.w3.org/International/questions/qa-who-uses-unicode FAQ: Who uses Unicode?]
  However, you are not sure what's involved and whether it will work for your site.
  This FAQ will attempt to list some of the considerations you would need to take into account for the encoding of web pages.
  Note that if you are using a content management system to generate web pages, you may need to consider your storage encoding, migration of legacy data, software support.
  [MD 22 mar] Maybe mention here that some mobile phones don't yet support UTF-8 (but some do, although with a limited range of characters).
- Answer 
+ == Answer == 
- Which Unicode encoding for web pages? 
+ === Which Unicode encoding for web pages? ===
  Unicode has three main encodings: UTF-8, UTF-16, UTF-32.
  UTF-8 is the Unicode encoding consistently used for web pages:
- • Better compatibility with legacy data, where that legacy data uses ASCII as the 128 codepoints in ASCII match the first 128 codepoints in UTF-8.
+    * Better compatibility with legacy data, where that legacy data uses ASCII as the 128 codepoints in ASCII match the first 128 codepoints in UTF-8.
- • No byte order problems as UTF-8 is 8-bit.
+    * No byte order problems as UTF-8 is 8-bit.
- How well is Unicode supported for my end users?
+ === How well is Unicode supported for my end users? ===
  This depends on:
- • browser support
+    * browser support
- • suitable fonts
+    * suitable fonts
- • rendering software
+    * rendering software
- Browser support
+ ==== Browser support ====
  Modern browsers support Unicode:
- • Internet Explorer 6 (Windows)
+    * Internet Explorer 6 (Windows)
- • Firefox 1.0 
+    * Firefox 1.0 
- • Mozilla 1.4
+    * Mozilla 1.4
- • Opera 7.0
+    * Opera 7.0
- • Netscape Navigator 7.0
+    * Netscape Navigator 7.0
- • Safari 1.03
+    * Safari 1.03
- • Internet Explorer 5.2 (Mac)
+    * Internet Explorer 5.2 (Mac)
- Suitable fonts
+ ==== Suitable fonts ====
  Correct script display requires Unicode support by the operating system and availability on the machine of Unicode fonts. 
  CSS can help with font family fallbacks in the case where the user does not have a specific font, but another font will display the text readably. Do use CSS generic font family fallbacks, eg, serif, sans-serif.
  Modern operating systems support Unicode:
- • Windows NT and its descendants Windows 2000 and Windows XP
+    * Windows NT and its descendants Windows 2000 and Windows XP
- • UNIX-like operating systems such as GNU/Linux
+    * UNIX-like operating systems such as GNU/Linux
- •	BSD
- •	Mac OS X
+    * BSD
+    * Mac OS X
  Standard installation of an operating system includes suitable fonts for the language selected by the user. Fonts not included in a standard installation can usually be added via menu options; they can also be downloaded. Some languages currently require a font download; these languages include Pashto, Hindi, Urdu, Bengali.
  Commonly available Unicode fonts (commercial and open source) are [http://en.wikipedia.org/wiki/Truetype TrueType] and the more recent [http://en.wikipedia.org/wiki/Opentype OpenType].
  Unicode fonts or ‘font families’ provide a mapping from Unicode codepoints to the graphical representation of characters, ie, glyphs. Unicode fonts usually cover [http://www.babelstone.co.uk/Fonts/Fonts.html specific scripts]. Applications such as browsers usually cover Unicode by using several fonts for different scripts and ranges.
  Font display problems:
- • ISO-8859-1/windows-XXXX: an operating system or browser either has a font installed for that encoding or it doesn't, therefore either the page displays correctly or no characters display (question marks).
+    * ISO-8859-1/windows-XXXX: an operating system or browser either has a font installed for that encoding or it doesn't, therefore either the page displays correctly or no characters display (question marks).
- • Unicode: the operating system or browser has fonts for some, but not all, of the codepoints, so when displaying a Unicode page, some of the characters may display correctly whilst others don't because the browser has access to fonts for some of the codepoints but not all (empty rectangles).
+    * Unicode: the operating system or browser has fonts for some, but not all, of the codepoints, so when displaying a Unicode page, some of the characters may display correctly whilst others don't because the browser has access to fonts for some of the codepoints but not all (empty rectangles).
- Rendering software
+ ==== Rendering software ====
  Multilingual text rendering engines are built into operating system and browser installation.
- • Windows: Uniscribe
+    * Windows: Uniscribe
- • Macintosh: Apple Type Services for Unicode Imaging, which replaced the WorldScript engine for legacy encodings.
+    * Macintosh: Apple Type Services for Unicode Imaging, which replaced the WorldScript engine for legacy encodings.
- • Pango - open source
+    * Pango - open source
- • Graphite - (open source renderer from SIL)
+    * Graphite - (open source renderer from SIL)
- What I don’t need to worry about
+ === What I don’t need to worry about ===
- Page weight
+ ==== Page weight ====
  Same page weight as for legacy encodings:
- • HTML markup
+    * HTML markup
- • English
+    * English
  Slightly heavier
- • Latin languages: characters, eg, e acute, outside the ASCII range (128 codepoints), are represented by one byte in ISO-8859-1, but typically two bytes in UTF-8, so a small, but acceptable, increase in page size should be expected.
+    * Latin languages: characters, eg, e acute, outside the ASCII range (128 codepoints), are represented by one byte in ISO-8859-1, but typically two bytes in UTF-8, so a small, but acceptable, increase in page size should be expected.
- • Characters that do not fall into the ASCII range, such as Chinese, Arabic, Russian, may use 2 or even 3 bytes. Chinese encodings already use more than 1 byte per character with legacy encodings, where they use double bytes.
+    * Characters that do not fall into the ASCII range, such as Chinese, Arabic, Russian, may use 2 or even 3 bytes. Chinese encodings already use more than 1 byte per character with legacy encodings, where they use double bytes.
- Don't forget
+ === Don't forget ===
- Character encoding declaration 
+ ==== Character encoding declaration ====
  You should ensure that you change the [http://www.w3.org/International/tutorials/tutorial-char-enc Tutorial: character encoding declaration] from legacy to Unicode. 
- • HTTP header content-type, eg, Content-Type: text/html; charsetutf-8
- • HTML head, eg, <meta http-equiv"Content-Type" content"text/html; charsetutf-8"/>
- Further reading 
-   * [http://www.w3.org/International/questions/qa-who-uses-unicode FAQ: Who uses Unicode?]
-   * [http://www.unicode.org/help/display_problems.html Settings to change to resolve display problems in Unicode]
-   * [http://en.wikipedia.org/wiki/Truetype Information about TrueType font]
-   * [http://en.wikipedia.org/wiki/Opentype Information about OpenType font]
-   * [http://www.babelstone.co.uk/Fonts/Fonts.html Unicode fonts and specific scripts].
-   * [http://www.unicode.org Unicode Consortium]
-   * [http://www.w3.org/International/tutorials/tutorial-char-enc Tutorial: Character sets & encodings in XHTML, HTML and CSS]
-   * [http://www.alanwood.net/unicode/browsers.html Unicode & multilingual web browsers]
-   * [http://en.wikipedia.org/wiki/Unicode_and_HTML Unicode & HTML]
+    * HTTP header content-type, eg, Content-Type: text/html; charsetutf-8
+    * HTML head, eg, <meta http-equiv"Content-Type" content"text/html; charsetutf-8"/>
+ === Further reading === 
+    * [http://www.w3.org/International/questions/qa-who-uses-unicode FAQ: Who uses Unicode?]
+    * [http://www.unicode.org/help/display_problems.html Settings to change to resolve display problems in Unicode]
+    * [http://en.wikipedia.org/wiki/Truetype Information about TrueType font]
+    * [http://en.wikipedia.org/wiki/Opentype Information about OpenType font]
+    * [http://www.babelstone.co.uk/Fonts/Fonts.html Unicode fonts and specific scripts].
+    * [http://www.unicode.org Unicode Consortium]
+    * [http://www.w3.org/International/tutorials/tutorial-char-enc Tutorial: Character sets & encodings in XHTML, HTML and CSS]
+    * [http://www.alanwood.net/unicode/browsers.html Unicode & multilingual web browsers]
+    * [http://en.wikipedia.org/wiki/Unicode_and_HTML Unicode & HTML]

Received on Tuesday, 3 May 2005 21:46:58 UTC