RE: non-sgml characters

On Mon, 15 Jul 2002, Jon Hanna wrote:

> > >       (™)
>
> This is a valid way of constructing a character set, as long as one doesn't
> claim to be using Latin-1 etc.

Not quite.

A byte having the value 153 is OK[1], if you have declared your document
as having a proprietary charset that assigns meaning to it.

The sequence ™ is incorrect, regardless.

As a datapoint, at least two operating systems have "native" charsets
that assign characters to bytes in the range 128-159, and browsers
running on those OSs may display them if they fall back to a native
charset for error recovery - or if they don't deal with i18n at all.
But the assignments of these characters are totally different on
RiscOS and Windows.

-- 
Nick Kew

Available for contract work - Programming, Unix, Networking, Markup, etc.

Received on Monday, 15 July 2002 13:19:19 UTC