Re: Problem with LANG keyword

> 
> Hmmm, nice I did not think about that. So the use of "&#...;" is actually
> should be used for a very specific list of symbols.

&#....; always represents the ISO 10646 (loosely Unicode) code point.

In very old versions of HTML it was the 256 character initial subset, 
which is identical to ISO 8859/1.  Most of the control characters and
some other control-like characters are not allowed.   In particular,
although generated by certain common authoring tools, ’ and
“ are control characters and not permitted.

The conceptual process is:

- if the character set is in the real HTTP content-type header, note that;
- otherwise, if the document appears to be in 16 bit Unicode or an ASCII
  superset, scan it for a meta for content type, and extract the 
  character set;
- if neither succeeds in extracting a character set, the document is in
  error, and here the spec contradicts itself by saying that the browser
  must not use a default but suggesting that it may use heuristics
  (to me a default is a heuristic);
- translate the whole document from the character set identified above into
  ISO 10646;
- parse it, including expanding any numeric entities;
- render it;
- convert the result into platform fonts that includes the appropriate
  character, using CSS font hinting, but not so as to force a false encoding
  - specifying 5<span style="font-face: Symbol">m</span>V should produce
  five millivolts, not the five microvolts that is likely to appear on 
  many browsers - browser that handle other fonts correctly and likely to
  deliberately misinterpret Symbol.

Received on Wednesday, 24 September 2003 18:12:28 UTC