Re: greek char in UTF-8 (part 2) from Albert Lunde on 2000-05-09 (www-international@w3.org from April to June 2000)

From: Albert Lunde <duerst@w3.org>
Date: Tue, 09 May 2000 14:28:51 +0900
To: www-international@w3.org
Message-Id: <4.2.0.58.J.20000509142830.02fe18b0@sh.w3.mag.keio.ac.jp>

 > Chris, you were right, my doctype was wrong; only the first file
 > ( http://www.bibl.ulaval.ca/doelec/theses/memoires/1999/ChRiviere/riv.htm )
 > is a frameset, I corrected the others. Secondly, *SOME* of my greek
 > characters are not ASCII, otherwise they would all display well in any
 > font, which is not the case. Concerning NCR (Numeric character reference, I
 > presume), if you look at the source of my file, you will see some NCR that
 > are below 256 (by ex. &#233; ) thus, ISO-Latin; but many are over 256, thus
 > unicode. So I must send this in UTF-8, must I ?

Numeric character references are NOT (correctly) interpreted according
to the character encoding (a.k.a. charset) used to store or transmit
the document, but rather according to the SGML "Document Character
Set" which is always Unicode, or I think more precisely, ISO-10646.

So with user agents that pay any attention to the standards, you can
use Unicode numeric character references in a document represented in
ANY character encoding. (Getting usable fonts is not solved by this
fact, however.)

I forget what version of HTML/XML this started with, but see for
example:

http://www.w3.org/TR/html4/charset.html

--
     Albert Lunde          Albert-Lunde@northwestern.edu (new address)
                           Albert-Lunde@nwu.edu (old address)

Received on Tuesday, 9 May 2000 07:41:45 UTC