W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > July to September 2002

RE: non-sgml characters

From: Jon Hanna <jon@spin.ie>
Date: Mon, 15 Jul 2002 14:21:22 +0100
To: <w3c-wai-ig@w3.org>
Message-ID: <NDBBLCBLIMDOPKMOPHLHAEICEDAA.jon@spin.ie>

> > markup elements and the formal symbol for a TM sign is,
> >
> >     &trade;    (&#153;)
>
> No, 153 is an unused code point in HTML.  What is defined in HTML 4 is
> as follows:

I think the source of confusion is probably the practices of many systems of
using the unused character points in the lower byte-ranges of UCS for
various characters.

This is a valid way of constructing a character set, as long as one doesn't
claim to be using Latin-1 etc.

It is also reasonable when encountering an invalid character to output one
based on such a practice, as part of the general principal that browsers
should attempt to act as the author most likely intended when encountering
an obvious error. IE on Windows will interpret a character claiming to be
UCS point 153 as a trademark sign.

Of course misinterpreting character encodings can lead to subtle security
problems, as well as preventing authors from realising they are in error
(quite a few bugs in IIS of late were due to differening interpretations of
UTF-8)
Received on Monday, 15 July 2002 09:21:11 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 19 July 2011 18:14:05 GMT