W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > July to September 2002

Re: non-sgml characters

From: Masayasu Ishikawa <mimasa@w3.org>
Date: Mon, 15 Jul 2002 22:11:18 +0900 (JST)
Message-Id: <20020715.221118.41626954.mimasa@w3.org>
To: sub@shanx.com
Cc: donnah1@mac.com, w3c-wai-ig@w3.org

"Shashank Tripathi" <sub@shanx.com> wrote:

> I am no expert, and hopefully someone much more knowledgible will answer
> to you as well, but from what I understand, HTML documents are made up
> of 8-bit characters from the ISO 8859 Latin-1 character set.

No, the document character set for HTML is the Universal Character Set (UCS).

> ISO SGML entity definitions are used to include characters which are
> missing from the character set or which would otherwise be confused with
> markup elements and the formal symbol for a TM sign is, 
>     &trade;    (&#153;)

No, 153 is an unused code point in HTML.  What is defined in HTML 4 is
as follows:

  <!ENTITY trade    CDATA "&#8482;" -- trade mark sign, U+2122 ISOnum -->

> However, not all browsers support these definitions...emacs for instance
> would show the "&trade;". So as far as HTML is concerned, you may be
> alright leaving it as "&#153;".

Please don't.  That's completely messing up document character set and
character encoding.

Masayasu Ishikawa / mimasa@w3.org
W3C - World Wide Web Consortium
Received on Monday, 15 July 2002 09:11:33 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:36:10 UTC