RE: non-sgml characters

> > markup elements and the formal symbol for a TM sign is,
> >
> >     ™    (™)
>
> No, 153 is an unused code point in HTML.  What is defined in HTML 4 is
> as follows:

I think the source of confusion is probably the practices of many systems of
using the unused character points in the lower byte-ranges of UCS for
various characters.

This is a valid way of constructing a character set, as long as one doesn't
claim to be using Latin-1 etc.

It is also reasonable when encountering an invalid character to output one
based on such a practice, as part of the general principal that browsers
should attempt to act as the author most likely intended when encountering
an obvious error. IE on Windows will interpret a character claiming to be
UCS point 153 as a trademark sign.

Of course misinterpreting character encodings can lead to subtle security
problems, as well as preventing authors from realising they are in error
(quite a few bugs in IIS of late were due to differening interpretations of
UTF-8)

Received on Monday, 15 July 2002 09:21:11 UTC