Date: Tue, 26 Nov 1996 19:15:49 -0500 (EST) From: Misha Wolf <MISHA.WOLF@reuters.com> Subject: Re: HTML - i18n / NCR & charsets In-Reply-To: <9611261813.AA08437@jrc.it> To: www-html <email@example.com> Message-Id: <9849151926111996/A24040/RE6/11ABD4CF3100*@MHS> The following extract from RFC 1866, "Hypertext Markup Language - 2.0" shows that legal numeric character references have been based on Unicode for quite some time and certainly prior to the I18N draft. Misha --- 1.2.1. Documents A document is a conforming HTML document if: * It is a conforming SGML document, and it conforms to the HTML DTD (see 9.1, "HTML DTD"). NOTE - There are a number of syntactic idioms that are not supported or are supported inconsistently in some historical user agent implementations. These idioms are identified in notes like this throughout this specification. * It conforms to the application conventions in this specification. For example, the value of the HREF attribute of the <A> element must conform to the URI syntax. * Its document character set includes [ISO-8859-1] and agrees with [ISO-10646]; that is, each code position listed in 13, "The HTML Coded Character Set" is included, and each code position in the document character set is mapped to the same character as [ISO-10646] designates for that code position. NOTE - The document character set is somewhat independent of the character encoding scheme used to represent a document. For example, the `ISO-2022-JP' character encoding scheme can be used for HTML documents, since its repertoire is a subset of the [ISO-10646] repertoire. The critical distinction is that numeric character references agree with [ISO-10646] regardless of how the document is encoded.