- From: Chris Maden <crism@oreilly.com>
- Date: Tue, 23 Mar 1999 10:49:07 -0500 (EST)
- To: www-html@w3.org
[Alan G. Isaac] > I'm looking at > http://www.w3.org/TR/REC-html40/sgml/entities.html > > Why is it   and   but — and – ? Good question. Unfortunately, it's not really subject to debate; those entity names are from the ISO standard entity sets. Some of the names are twisted because they were subject to a six-character name restriction (not for any really good reason), but I've never really been clear on why mdash and ndash are named that way. > Why are — and – equivalent to — and – > instead of — and ˜ ? Because they should work on operating systems that don't come from Redmond. > Why are no character entity references below   listed on this > page? Same reason. The 8-bit ISO character sets (ISO 8859-*, 8859-1 is western European) reserve 129-159 as control characters. Windows uses different character sets (CP 12*, CP 1252 for western European). Since it doesn't need the upper control characters, it uses that range for characters missing from the corresponding ISO sets, like oe ligatures, en and em dashes, ellipses, and s and z caron. A numeric character reference (&#...;) is a reference to the document character set, not the encoding; in HTML 4, the document character set is Unicode (regardless of what bytes are actually used to store and transmit the characters). Unicode has a control character at code point 151, but an em dash at code point 8212. -Chris -- <!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN"> <!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN" "<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487 <USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>
Received on Friday, 26 March 1999 15:10:43 UTC