- From: Jon Hanna <jon@spin.ie>
- Date: Mon, 15 Jul 2002 13:24:31 +0100
- To: <w3c-wai-ig@w3.org>
> I am no expert, and hopefully someone much more knowledgible will answer > to you as well, but from what I understand, HTML documents are made up > of 8-bit characters from the ISO 8859 Latin-1 character set. HTML documents are made encoded in any character set. Latin-1 has the advantage of being code-point compatible with both ASCII and Unicode, and hence was once used as the default. UTF-8 has the advantage of being code-point compatible with ASCII and capable of directly encoding all Unicode code-points and hence was chosen as one of the defaults for XML and hence HTML when it became XHTML in 1999 (UTF-16 is the other default - it can safely have 2 defaults as it is easy to tell them apart from the first couple of bytes). One obvious disadvantage of Latin-1 here would be that it has no TM glyph :) One way to think of this is to think of the Unicode code-point as a sort of Platonic form, with it's encodings in various sets as a more "physical" (as physical as you can get with a bunch of bits) reality of that form. > ISO SGML entity definitions are used to include characters which are > missing from the character set or which would otherwise be confused with > markup elements and the formal symbol for a TM sign is, > > ™ (™) > > The one for registered trademark (R) is, > > ® (®) > > And so on... > http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent and http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent are the entities referenced by the DTDs, they define ™ as: <!ENTITY trade "™"> <!-- trade mark sign, U+2122 ISOnum --> They agree with you on ® though (I'm guessing you are going by the Window's charset, which agrees with Unicode and Latin-1 on this one): <!ENTITY reg "®"> <!-- registered sign = registered trade mark sign, U+00AE ISOnum -->
Received on Monday, 15 July 2002 08:24:31 UTC