- From: <w3t-archive+esw-wiki@w3.org>
- Date: Tue, 03 May 2005 21:46:45 -0000
- To: w3t-archive+esw-wiki@w3.org
Dear Wiki user, You have subscribed to a wiki page or wiki category on "ESW Wiki" for change notification. The following page has been changed by Deborah Cawkwell: http://esw.w3.org/topic/geoUnicodeConsiderationsWhenUpgrading The comment on the change is: New version with WIKI formatting ------------------------------------------------------------------------------ These changes should be sent to the GEO public list - providing us with a consistent notification method and an archive of changes. ---- - FAQ: Upgrading from language-specific legacy encoding to Unicode encoding + = FAQ: Upgrading from language-specific legacy encoding to Unicode encoding = - Question: What should I consider when upgrading my web pages from legacy encoding to Unicode encoding? + == Question: What should I consider when upgrading my web pages from legacy encoding to Unicode encoding? == - Background + === Background === You have heard that using Unicode is a good idea and that there are benefits such as standards compatibility, multilingual display on a single page, pan-organisation applications. + Numerous large organizations are beginning to switch to Unicode: [http://www.w3.org/International/questions/qa-who-uses-unicode FAQ: Who uses Unicode?] However, you are not sure what's involved and whether it will work for your site. + This FAQ will attempt to list some of the considerations you would need to take into account for the encoding of web pages. + Note that if you are using a content management system to generate web pages, you may need to consider your storage encoding, migration of legacy data, software support. + [MD 22 mar] Maybe mention here that some mobile phones don't yet support UTF-8 (but some do, although with a limited range of characters). - Answer + + == Answer == + - Which Unicode encoding for web pages? + === Which Unicode encoding for web pages? === + Unicode has three main encodings: UTF-8, UTF-16, UTF-32. + UTF-8 is the Unicode encoding consistently used for web pages: + - • Better compatibility with legacy data, where that legacy data uses ASCII as the 128 codepoints in ASCII match the first 128 codepoints in UTF-8. + * Better compatibility with legacy data, where that legacy data uses ASCII as the 128 codepoints in ASCII match the first 128 codepoints in UTF-8. - • No byte order problems as UTF-8 is 8-bit. + * No byte order problems as UTF-8 is 8-bit. + - How well is Unicode supported for my end users? + === How well is Unicode supported for my end users? === + This depends on: - • browser support + * browser support - • suitable fonts + * suitable fonts - • rendering software + * rendering software + - Browser support + ==== Browser support ==== + Modern browsers support Unicode: + - • Internet Explorer 6 (Windows) + * Internet Explorer 6 (Windows) - • Firefox 1.0 + * Firefox 1.0 - • Mozilla 1.4 + * Mozilla 1.4 - • Opera 7.0 + * Opera 7.0 - • Netscape Navigator 7.0 + * Netscape Navigator 7.0 - • Safari 1.03 + * Safari 1.03 - • Internet Explorer 5.2 (Mac) + * Internet Explorer 5.2 (Mac) - Suitable fonts + + ==== Suitable fonts ==== + Correct script display requires Unicode support by the operating system and availability on the machine of Unicode fonts. + CSS can help with font family fallbacks in the case where the user does not have a specific font, but another font will display the text readably. Do use CSS generic font family fallbacks, eg, serif, sans-serif. + Modern operating systems support Unicode: + - • Windows NT and its descendants Windows 2000 and Windows XP + * Windows NT and its descendants Windows 2000 and Windows XP - • UNIX-like operating systems such as GNU/Linux + * UNIX-like operating systems such as GNU/Linux - • BSD - • Mac OS X + * BSD + * Mac OS X + Standard installation of an operating system includes suitable fonts for the language selected by the user. Fonts not included in a standard installation can usually be added via menu options; they can also be downloaded. Some languages currently require a font download; these languages include Pashto, Hindi, Urdu, Bengali. + Commonly available Unicode fonts (commercial and open source) are [http://en.wikipedia.org/wiki/Truetype TrueType] and the more recent [http://en.wikipedia.org/wiki/Opentype OpenType]. + Unicode fonts or ‘font families’ provide a mapping from Unicode codepoints to the graphical representation of characters, ie, glyphs. Unicode fonts usually cover [http://www.babelstone.co.uk/Fonts/Fonts.html specific scripts]. Applications such as browsers usually cover Unicode by using several fonts for different scripts and ranges. + Font display problems: + - • ISO-8859-1/windows-XXXX: an operating system or browser either has a font installed for that encoding or it doesn't, therefore either the page displays correctly or no characters display (question marks). + * ISO-8859-1/windows-XXXX: an operating system or browser either has a font installed for that encoding or it doesn't, therefore either the page displays correctly or no characters display (question marks). - • Unicode: the operating system or browser has fonts for some, but not all, of the codepoints, so when displaying a Unicode page, some of the characters may display correctly whilst others don't because the browser has access to fonts for some of the codepoints but not all (empty rectangles). + * Unicode: the operating system or browser has fonts for some, but not all, of the codepoints, so when displaying a Unicode page, some of the characters may display correctly whilst others don't because the browser has access to fonts for some of the codepoints but not all (empty rectangles). + - Rendering software + ==== Rendering software ==== + Multilingual text rendering engines are built into operating system and browser installation. + - • Windows: Uniscribe + * Windows: Uniscribe - • Macintosh: Apple Type Services for Unicode Imaging, which replaced the WorldScript engine for legacy encodings. + * Macintosh: Apple Type Services for Unicode Imaging, which replaced the WorldScript engine for legacy encodings. - • Pango - open source + * Pango - open source - • Graphite - (open source renderer from SIL) + * Graphite - (open source renderer from SIL) + - What I don’t need to worry about + === What I don’t need to worry about === - Page weight + + ==== Page weight ==== + Same page weight as for legacy encodings: + - • HTML markup + * HTML markup - • English + * English + Slightly heavier + - • Latin languages: characters, eg, e acute, outside the ASCII range (128 codepoints), are represented by one byte in ISO-8859-1, but typically two bytes in UTF-8, so a small, but acceptable, increase in page size should be expected. + * Latin languages: characters, eg, e acute, outside the ASCII range (128 codepoints), are represented by one byte in ISO-8859-1, but typically two bytes in UTF-8, so a small, but acceptable, increase in page size should be expected. + - • Characters that do not fall into the ASCII range, such as Chinese, Arabic, Russian, may use 2 or even 3 bytes. Chinese encodings already use more than 1 byte per character with legacy encodings, where they use double bytes. + * Characters that do not fall into the ASCII range, such as Chinese, Arabic, Russian, may use 2 or even 3 bytes. Chinese encodings already use more than 1 byte per character with legacy encodings, where they use double bytes. + - Don't forget + === Don't forget === + - Character encoding declaration + ==== Character encoding declaration ==== + You should ensure that you change the [http://www.w3.org/International/tutorials/tutorial-char-enc Tutorial: character encoding declaration] from legacy to Unicode. - • HTTP header content-type, eg, Content-Type: text/html; charsetutf-8 - • HTML head, eg, <meta http-equiv"Content-Type" content"text/html; charsetutf-8"/> - Further reading - * [http://www.w3.org/International/questions/qa-who-uses-unicode FAQ: Who uses Unicode?] - * [http://www.unicode.org/help/display_problems.html Settings to change to resolve display problems in Unicode] - * [http://en.wikipedia.org/wiki/Truetype Information about TrueType font] - * [http://en.wikipedia.org/wiki/Opentype Information about OpenType font] - * [http://www.babelstone.co.uk/Fonts/Fonts.html Unicode fonts and specific scripts]. - * [http://www.unicode.org Unicode Consortium] - * [http://www.w3.org/International/tutorials/tutorial-char-enc Tutorial: Character sets & encodings in XHTML, HTML and CSS] - * [http://www.alanwood.net/unicode/browsers.html Unicode & multilingual web browsers] - * [http://en.wikipedia.org/wiki/Unicode_and_HTML Unicode & HTML] + * HTTP header content-type, eg, Content-Type: text/html; charsetutf-8 + * HTML head, eg, <meta http-equiv"Content-Type" content"text/html; charsetutf-8"/> + + === Further reading === + * [http://www.w3.org/International/questions/qa-who-uses-unicode FAQ: Who uses Unicode?] + * [http://www.unicode.org/help/display_problems.html Settings to change to resolve display problems in Unicode] + * [http://en.wikipedia.org/wiki/Truetype Information about TrueType font] + * [http://en.wikipedia.org/wiki/Opentype Information about OpenType font] + * [http://www.babelstone.co.uk/Fonts/Fonts.html Unicode fonts and specific scripts]. + * [http://www.unicode.org Unicode Consortium] + * [http://www.w3.org/International/tutorials/tutorial-char-enc Tutorial: Character sets & encodings in XHTML, HTML and CSS] + * [http://www.alanwood.net/unicode/browsers.html Unicode & multilingual web browsers] + * [http://en.wikipedia.org/wiki/Unicode_and_HTML Unicode & HTML] +
Received on Tuesday, 3 May 2005 21:46:58 UTC