- From: Albert Lunde <Albert-Lunde@nwu.edu>
- Date: Wed, 1 Feb 1995 18:29:14 +0100
- To: Multiple recipients of list <www-html@www0.cern.ch>
At 2:54 PM 2/1/95, Donald=Greer@tsl.texas.gov wrote: > I believe that Latin 1 is the only specified extended character set. For a >definitive answer though, check the DTD or the HTML 2.0 drafts. A recent version of the HTML 2.0 draft says in the section on MIME and HTML: >Character sets > The charset parameter is reserved for future use. See Section 2.16 for a > discussion of character sets and encodings in HTML. > > The actual character set used in the representation of an HTML document > may be ISO 8859/1, or its 7-bit subset which is ISO 646. There is no > obligation for an HTML document to contain any characters above decimal > 127. It is possible that a transport medium such as electronic mail >imposes > constraints on the number of bits in a representation of a document, >though > the HTTP access protocol used by WWW always allows 8 bit transfer. I think the context of this is that HTML 2.0 is intended mostly to specify current practice as of mid-94 and intention is that HTML 2.1 would introduce "minor" ;) ;) extensions like international character sets. Discussion of these issues has broken out, on and off, for the last two months (at least) on the HTTP and HTML working group lists. It's my personal opinion that there would be relatively little controversy about extending HTML/HTTP specs to allow use of the MIME charset parameter for ISO-8859-X where X=1 to 9 (the characters sets already mentioned in the MIME RFCs). *However*, this has not yet actually been done, and there an implementation problem in that not all WWW software parses MIME charset parameters. (De-facto I think people are using other character sets anyway and hacking their clients to convert them, based on out-of band knowledge of the correct charset.) What seems more controversial is the treatment of mixed character sets and/or languages in a single document. This brings in Unicode and other things like ISO 2022 or ideas from the Text Encoding Inititative. I don't know what solutions will get standardized. (The options are constrained somewhat by keeping HTML SGML compliant.) For more information see: HTML WG archive <URL:http://www.acl.lanl.gov/HTML_WG/archives.html> HTTP WG archive <URL:http://www.ics.uci.edu/pub/ietf/http/hypermail/> and my personal collection of bookmarks: <URL:http://www.mcs.com/%7Elunde/web/aboutwww.html> --- Albert Lunde Albert-Lunde@nwu.edu
Received on Wednesday, 1 February 1995 09:36:44 UTC