W3C home > Mailing lists > Public > w3c-wai-ig@w3.org > July to September 2002

RE: non-sgml characters

From: Shashank Tripathi <sub@shanx.com>
Date: Mon, 15 Jul 2002 20:59:46 +0900
To: "'donnah'" <donnah1@mac.com>, <w3c-wai-ig@w3.org>
Message-ID: <001f01c22bf7$1e845260$0200a8c0@SHASHANK>

Hi Donna, 

I am no expert, and hopefully someone much more knowledgible will answer
to you as well, but from what I understand, HTML documents are made up
of 8-bit characters from the ISO 8859 Latin-1 character set. The network
protocol used to retrieve documents may translate the character set into
a locally acceptable form, e.g. EBCDIC. The HTTP protocol uses the MIME
standard (RFC 1341) to specify the document type and character set. 

ISO SGML entity definitions are used to include characters which are
missing from the character set or which would otherwise be confused with
markup elements and the formal symbol for a TM sign is, 

    &trade;    (&#153;)

The one for  registered trademark (R)  is,

    &reg;     (&#174;)

And so on...

However, not all browsers support these definitions...emacs for instance
would show the "&trade;". So as far as HTML is concerned, you may be
alright leaving it as "&#153;". If you want it to be recognized by a
SGML parser for some reason, you could try explicitly declaring the
entity name. You may like to see http://www.bbsinc.com/iso8879b.html. 

Wonder if I helped,
Shashank


Shashank Tripathi
www.shanx.com
Received on Monday, 15 July 2002 08:00:38 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 19 July 2011 18:14:05 GMT