- From: Karlsson Kent - keka <keka@im.se>
- Date: Fri, 3 Dec 1999 23:11:46 +0100
- To: "'Murray Altheim'" <altheim@eng.sun.com>, John Delacour <JD@EREMITA.demon.co.uk>
- Cc: www-html@w3.org
- Message-ID: <C110A2268F8DD111AA1A00805F85E58DA68486@ntgbg1>
> -----Original Message----- > From: Murray Altheim [mailto:altheim@eng.sun.com] > Sent: Friday, December 03, 1999 12:23 PM > To: John Delacour > Cc: www-html@w3.org > Subject: Re: accented characters, etc. > > > John Delacour wrote: > > > > After all Unicode itself is an ISO standard. > > No, it's a product of the Unicode Consortium. There are attempts at > keeping ISO 10646 in line (so it is similar to Unicode but generally > not identical), but the Unicode standard is not an ISO standard. To be nitpicking: Unicode 3.0 and ISO/IEC 10646-1:2000 have EXACTLY the same characters at the same code points. (Unicode 2.1 is a bit harder to pinpoint relative to 10646-1:1993: Amd.1-7 plus two more characters from a later amendment). Unicode defines in addition to characters at code points also character properties, and the BiDi algorithm. These are not part of 10646 yet. Furthermore Unicode defines canonical and compatibility mappings, as character properties, and a normalisation algorithm. So, just looking at characters at code points, Unicode 3.0 and 10646 in its year 2000 incarnation are identical. Beyond code point allocations there are differences, mainly that Unicode normatively specifies things that 10646 (yet) does not speak of. There are some other points as well, which I will not bore you with. I would predict that Unicode and 10646, at main synchronisation points, will remain identical regarding character allocations. But it is correct that 10646 and Unicode are not the same otherwise, and might not become the same for quite a while, if ever. /Kent Karlsson PS Regarding HTML character entities: Please instead use the proper characters directly whenever possible. NCRs and named characters should be used ONLY when you cannot express the proper character directy in the encoding used for the document. So if the document is in UTF-8 you never really need any NCRs or named character entities.
Received on Friday, 3 December 1999 17:14:05 UTC