Re: non-sgml characters from Masayasu Ishikawa on 2002-07-15 (w3c-wai-ig@w3.org from July to September 2002)

From: Masayasu Ishikawa <mimasa@w3.org>
Date: Mon, 15 Jul 2002 22:11:18 +0900 (JST)
To: sub@shanx.com
Cc: donnah1@mac.com, w3c-wai-ig@w3.org
Message-Id: <20020715.221118.41626954.mimasa@w3.org>

"Shashank Tripathi" <sub@shanx.com> wrote:

> I am no expert, and hopefully someone much more knowledgible will answer
> to you as well, but from what I understand, HTML documents are made up
> of 8-bit characters from the ISO 8859 Latin-1 character set.

No, the document character set for HTML is the Universal Character Set (UCS).

> ISO SGML entity definitions are used to include characters which are
> missing from the character set or which would otherwise be confused with
> markup elements and the formal symbol for a TM sign is, 
> 
>     &trade;    (&#153;)

No, 153 is an unused code point in HTML.  What is defined in HTML 4 is
as follows:

  <!ENTITY trade    CDATA "&#8482;" -- trade mark sign, U+2122 ISOnum -->

> However, not all browsers support these definitions...emacs for instance
> would show the "&trade;". So as far as HTML is concerned, you may be
> alright leaving it as "&#153;".

Please don't.  That's completely messing up document character set and
character encoding.

Regards,
-- 
Masayasu Ishikawa / mimasa@w3.org
W3C - World Wide Web Consortium

Received on Monday, 15 July 2002 09:11:33 UTC