Re: International chars in HTML files

Nice summary -- I think you covered it quite well. Here are a couple of details to include 
in your next summary  :-)

>1) HTML uses ISO-8859-1, an 8-bit character set, codes 0-255, by default.
>8859-1 is the current default for HTTP - HTML documents may fully use the
>8859-1 set in the context of HTTP. There is no need to use codes or entity
>names (7-bit expressions) for 8859-1 characters, within the limits of your
>text editor and keyboard.

Newer browsers such as Netscape Navigator 2.0 allow the use of HTML META tags to specify a 
character set other than ISO 8859-1. ISO 8859-1 is the default character set, but if 
another character set is specified, 8-bit characters may produce something entirely 
different in the browser. In this case the character entities can still be used to produce 
the desired 8859-1 characters.

>2) Codes or names -must- be used to replace characters which would otherwise
>be interpreted as mark-up. There are four [<>&"], and they conform to ISO
>standards for their codes and names. Other codes or names from 8859-1 may
>be used to avoid similar confusion, e.g, [/\-_].

Your phrase "otherwise be interpreted as mark-up" is the key, but it's also ambiguous. As 
far as I understand (and you may have meant this), only < needs to always be replaced by 
its entity (&lt;). The others [>&"] only need to be replaced by their entities (&gt; &amp; 
and &quot;) if they're inside a tag. A quick check of 5 browsers (Navigator 2.0, 
Explorer 2.0b, MacWeb, AOL 2.6, Mosaic 2.0.1) confirms this. I don't know if the HTML DTD 
defines this behavior or not, but there are thousands of documents out there relying on it.

One other note. Inside <pre></pre> tags, character entities are not converted.

Jim Taylor <>
Director of Information Technology
Videodiscovery, Inc. - Multimedia Education for Science and Math
Seattle, WA, 206-285-5400, <>

Received on Monday, 22 January 1996 23:11:15 UTC