- From: Martin J. Dürst <mduerst@ifi.unizh.ch>
- Date: Wed, 19 Nov 1997 11:02:00 +0100 (MET)
- To: "Sam X. Sun" <ssun@CNRI.Reston.Va.US>
- cc: Misha Wolf <misha.wolf@reuters.com>, www international <www-international@w3.org>
On Wed, 19 Nov 1997, Sam X. Sun wrote:
> The RFC1766 (http://ds.internic.net/rfc/rfc1766.txt) defines the usage of
> character set encoding using format like:
>
> Content-type: text/plain; charset=iso-8859-10
> Content-Language: i-sami-no (North Sami)
> { context using ISO-8859-10 character set encoding. }
>
> There seems to be some difference to the way used in HTML4.0. Is there any
> effort to unify the practice?
RFC 1766 defines language tags and their use to tag emails.
RFC 2070 and thus HTML 4.0 follow RFC 1766 closely. The exceptions
I know are:
- In addition to identifying the (main) language of the whole
document, with the Content-Laguage header in HTTP
(or a corresponding META construct (discouraged)),
HTML allows language tagging of any element.
I.e. you can not only say <HTML lang="i-sami-no">,
but also <SPAN lang="i-sami-no">
Because Email is not structured text, this would be
very difficult to do for email (unless you send it
as HTML, of course!).
- In RFC 1766, language tags are indivisible. RFC 2070 and
HTML 4.0 allow language tags to be viewed as a hierarchy,
so that e.g. text tagged en-us can be matched with
hyphenation rules tagged en.
Regards, Martin.
Received on Wednesday, 19 November 1997 05:02:34 UTC