W3C home > Mailing lists > Public > www-international@w3.org > October to December 2003

RE: About UTF-8, XHTML and Character Encoding

From: Addison Phillips [wM] <aphillips@webmethods.com>
Date: Tue, 28 Oct 2003 10:39:14 -0800
To: "AmirBehzad Eslami" <behzad@delphiarea.com>, <www-international@w3.org>
Message-ID: <PNEHIBAMBMLHDMJDDFLHIEEPHCAA.aphillips@webmethods.com>
Hi Behzad,

You can use UTF-8 literal characters (UTF-8 byte sequences) in your web
pages as long as:

1. the page is declared to be UTF-8 (which you've done)
2. the page actually is encoded using UTF-8 (generally you must save the
file as UTF-8, as the default for many text editors is some legacy,
non-Unicode encoding: just because the file is declared to be UTF-8 doesn't
make it so and many people struggle with their pages as a result.)

In fact, the use of character references is a way to get various Unicode
characters into a non-Unicode encoded page. One of the nice things about
using a Unicode encoding is that you can enter and work with the text in the
page in a normal manner, using real characters and not worry so much about
it.

Hope that helps.

Best Regards,

Addison
Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
http://www.webMethods.com
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International

Internationalization is an architecture.
It is not a feature.

  -----Original Message-----
  From: www-international-request@w3.org
[mailto:www-international-request@w3.org]On Behalf Of AmirBehzad Eslami
  Sent: mardi 28 octobre 2003 10:08
  To: www-international@w3.org
  Subject: About UTF-8, XHTML and Character Encoding


  E-Greetings Every One,

  I'm developing a web site using XHTML in Farsi (persian - 'fa'). The page
encoded in UTF-8 using the following syntax in XHTML:

  <?xml version="1.0" encoding="UTF-8"?>
  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
  <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fa-IR">

  The web page contains non US-ASCII characters such as Farsi and Arabic
characters.
  My question is:

  Should I use "Character References" while writting the content in an XHTML
(UTF-8) web page?
  Or It is valid to use "Literal UTF-8" characters? (I mean it is not
necessary to define a character using Numeric Character Reference)


  Thanks in advance,
  Behzad
Received on Tuesday, 28 October 2003 13:45:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:03 GMT