- From: Jukka Korpela <jkorpela@cc.hut.fi>
- Date: Tue, 23 Mar 1999 18:59:23 +0200 (EET)
- To: www-html@w3.org
On Tue, 23 Mar 1999, Alan G. Isaac wrote: > Why is it   and   but — and – ? No special reason. I suppose they come from entity names listed in the SGML Handbook. I don't see any reason why future HTML specs could't add &emdash; and &endash; as synonyms for — and –. They are just named constants, so to say. > Why are — and – equivalent to — and – > instead of — and ˜ ? Because characters EM DASH and EN DASH occupy positions 8212 and 8211 in Unicode and ISO 10646, wheras code positions 151 and 152 are reserved for control characters in those standards and therefore — and ˜ are undefined in HTML. > Why are no character entity references below   listed on > this page? There are some, but you probably mean &#n; for n from 130 to 159. All those references are undefined in HTML. In HTML, &#n; means the character which occupies code position n in Unicode; if no such character exists, &#n; is undefined. Unfortunately some browsers get this wrong, interpreting n in &#n; as relative to the encoding specified for the document. This is fundamentally wrong by current specs. I don't think changing the specs would improve the situation; rather, it would confuse things more and impose serious restrictions. (Currently, by the specs, you can include _any_ Unicode character into your document even if its encoding is plain ASCII, for example.) For background explanations, see http://ppewww.ph.gla.ac.uk/%7eflavell/charset/internat.html -- Yucca, http://www.hut.fi/u/jkorpela/ or http://yucca.hut.fi/yucca.html
Received on Tuesday, 23 March 1999 11:59:36 UTC