- From: Martin J. Dürst <mduerst@ifi.unizh.ch>
- Date: Wed, 19 Nov 1997 11:02:00 +0100 (MET)
- To: "Sam X. Sun" <ssun@CNRI.Reston.Va.US>
- cc: Misha Wolf <misha.wolf@reuters.com>, www international <www-international@w3.org>
On Wed, 19 Nov 1997, Sam X. Sun wrote: > The RFC1766 (http://ds.internic.net/rfc/rfc1766.txt) defines the usage of > character set encoding using format like: > > Content-type: text/plain; charset=iso-8859-10 > Content-Language: i-sami-no (North Sami) > { context using ISO-8859-10 character set encoding. } > > There seems to be some difference to the way used in HTML4.0. Is there any > effort to unify the practice? RFC 1766 defines language tags and their use to tag emails. RFC 2070 and thus HTML 4.0 follow RFC 1766 closely. The exceptions I know are: - In addition to identifying the (main) language of the whole document, with the Content-Laguage header in HTTP (or a corresponding META construct (discouraged)), HTML allows language tagging of any element. I.e. you can not only say <HTML lang="i-sami-no">, but also <SPAN lang="i-sami-no"> Because Email is not structured text, this would be very difficult to do for email (unless you send it as HTML, of course!). - In RFC 1766, language tags are indivisible. RFC 2070 and HTML 4.0 allow language tags to be viewed as a hierarchy, so that e.g. text tagged en-us can be matched with hyphenation rules tagged en. Regards, Martin.
Received on Wednesday, 19 November 1997 05:02:34 UTC