Re: HTML - i18n / NCR & charsets

Misha Wolf (
Tue, 26 Nov 1996 19:15:49 -0500 (EST)

Date: Tue, 26 Nov 1996 19:15:49 -0500 (EST)
From: Misha Wolf <>
Subject: Re: HTML - i18n / NCR & charsets
To: www-html <>
The following extract from RFC 1866, "Hypertext Markup Language - 2.0" shows 
that legal numeric character references have been based on Unicode for quite 
some time and certainly prior to the I18N draft.



1.2.1. Documents

   A document is a conforming HTML document if:

        * It is a conforming SGML document, and it conforms to the
        HTML DTD (see 9.1, "HTML DTD").

            NOTE - There are a number of syntactic idioms that
            are not supported or are supported inconsistently in
            some historical user agent implementations. These
            idioms are identified in notes like this throughout
            this specification.

        * It conforms to the application conventions in this
        specification. For example, the value of the HREF attribute
        of the <A> element must conform to the URI syntax.

        * Its document character set includes [ISO-8859-1] and
        agrees with [ISO-10646]; that is, each code position listed
        in 13, "The HTML Coded Character Set" is included, and each
        code position in the document character set is mapped to the
        same character as [ISO-10646] designates for that code

            NOTE - The document character set is somewhat
            independent of the character encoding scheme used to
            represent a document. For example, the `ISO-2022-JP'
            character encoding scheme can be used for HTML
            documents, since its repertoire is a subset of the
            [ISO-10646] repertoire. The critical distinction is
            that numeric character references agree with
            [ISO-10646] regardless of how the document is