- From: David Woolley <david@djwhome.demon.co.uk>
- Date: Wed, 24 Sep 2003 23:05:35 +0100 (BST)
- To: www-html@w3.org
> > Hmmm, nice I did not think about that. So the use of "&#...;" is actually > should be used for a very specific list of symbols. &#....; always represents the ISO 10646 (loosely Unicode) code point. In very old versions of HTML it was the 256 character initial subset, which is identical to ISO 8859/1. Most of the control characters and some other control-like characters are not allowed. In particular, although generated by certain common authoring tools, ’ and “ are control characters and not permitted. The conceptual process is: - if the character set is in the real HTTP content-type header, note that; - otherwise, if the document appears to be in 16 bit Unicode or an ASCII superset, scan it for a meta for content type, and extract the character set; - if neither succeeds in extracting a character set, the document is in error, and here the spec contradicts itself by saying that the browser must not use a default but suggesting that it may use heuristics (to me a default is a heuristic); - translate the whole document from the character set identified above into ISO 10646; - parse it, including expanding any numeric entities; - render it; - convert the result into platform fonts that includes the appropriate character, using CSS font hinting, but not so as to force a false encoding - specifying 5<span style="font-face: Symbol">m</span>V should produce five millivolts, not the five microvolts that is likely to appear on many browsers - browser that handle other fonts correctly and likely to deliberately misinterpret Symbol.
Received on Wednesday, 24 September 2003 18:12:28 UTC