- From: <UUCPAdmin@vtech.com.hk>
- Date: Tue, 23 Jan 96 12:41:59
- To: www-html@w3.org
User vtech!Mailing_List@vtech.com.hk is not defined Original text follows ----------------------------------------- Received: by ccmail Received: from hksuper by com.hk (UUPC/extended 1.11) with UUCP; Tue, 23 Jan 1996 12:41:14 PST Received: from www19.w3.org (www19.w3.org [18.52.0.17]) by hk.super.net (8.7.1/8.7.1) with SMTP id MAA23791 for <Mailing_List@vtech.com.hk>; Tue, 23 Jan 1996 12:13:47 +0800 (HKT) Received: by www19.w3.org (8.6.12/8.6.12) id XAA22310; Mon, 22 Jan 1996 23:11:19 -0500 Resent-Date: Mon, 22 Jan 1996 23:11:19 -0500 Resent-Message-Id: <199601230411.XAA22310@www19.w3.org> Message-Id: <31045FEB.FBA@videodiscovery.com> Date: Mon, 22 Jan 1996 20:11:23 -0800 From: Jim Taylor <jhtaylor@videodiscovery.com> X-ccAdmin: UUCPAdmin@hksuper Organization: Videodiscovery X-Mailer: Mozilla 2.0b5 (Win16; I) Mime-Version: 1.0 To: darsal@tezcat.com, www-html@w3.org Subject: Re: International chars in HTML files X-Url: http://www.eit.com:80/goodies/lists/www.lists/www-html.1996q1/0168.html Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Resent-From: www-html@w3.org X-ccAdmin: UUCPAdmin@hksuper X-Mailing-List: <www-html@w3.org> archive/latest/2344 X-Loop: www-html@w3.org Sender: www-html-request@w3.org Resent-Sender: www-html-request@w3.org Precedence: list Nice summary -- I think you covered it quite well. Here are a couple of details to include in your next summary :-) >1) HTML uses ISO-8859-1, an 8-bit character set, codes 0-255, by default. >8859-1 is the current default for HTTP - HTML documents may fully use the >8859-1 set in the context of HTTP. There is no need to use codes or entity >names (7-bit expressions) for 8859-1 characters, within the limits of your >text editor and keyboard. Newer browsers such as Netscape Navigator 2.0 allow the use of HTML META tags to specify a character set other than ISO 8859-1. ISO 8859-1 is the default character set, but if another character set is specified, 8-bit characters may produce something entirely different in the browser. In this case the character entities can still be used to produce the desired 8859-1 characters. >2) Codes or names -must- be used to replace characters which would otherwise >be interpreted as mark-up. There are four [<>&"], and they conform to ISO >standards for their codes and names. Other codes or names from 8859-1 may >be used to avoid similar confusion, e.g, [/\-_]. Your phrase "otherwise be interpreted as mark-up" is the key, but it's also ambiguous. As far as I understand (and you may have meant this), only < needs to always be replaced by its entity (<). The others [>&"] only need to be replaced by their entities (> & and ") if they're inside a tag. A quick check of 5 browsers (Navigator 2.0, Explorer 2.0b, MacWeb, AOL 2.6, Mosaic 2.0.1) confirms this. I don't know if the HTML DTD defines this behavior or not, but there are thousands of documents out there relying on it. One other note. Inside <pre></pre> tags, character entities are not converted. __________________________________________________________________ Jim Taylor <mailto:jhtaylor@videodiscovery.com> Director of Information Technology Videodiscovery, Inc. - Multimedia Education for Science and Math Seattle, WA, 206-285-5400, <http://www.videodiscovery.com/vdyweb>
Received on Tuesday, 23 January 1996 00:08:31 UTC