Re: Unnecessary redraws with Content-Language from Martin J. Dürst on 1997-11-19 (www-international@w3.org from October to December 1997)

From: Martin J. Dürst <mduerst@ifi.unizh.ch>
Date: Wed, 19 Nov 1997 11:02:00 +0100 (MET)
To: "Sam X. Sun" <ssun@CNRI.Reston.Va.US>
cc: Misha Wolf <misha.wolf@reuters.com>, www international <www-international@w3.org>
Message-ID: <Pine.SUN.3.96.971119102030.282W-100000@enoshima.ifi.unizh.ch>

On Wed, 19 Nov 1997, Sam X. Sun wrote:

> The RFC1766 (http://ds.internic.net/rfc/rfc1766.txt) defines the usage of
> character set encoding using format like:
> 
> Content-type: text/plain; charset=iso-8859-10
> Content-Language: i-sami-no (North Sami) 
> 	{ context using ISO-8859-10 character set encoding. }
> 
> There seems to be some difference to the way used in HTML4.0. Is there any
> effort to unify the practice?

RFC 1766 defines language tags and their use to tag emails.
RFC 2070 and thus HTML 4.0 follow RFC 1766 closely. The exceptions
I know are:

- In addition to identifying the (main) language of the whole
	document, with the Content-Laguage header in HTTP
	(or a corresponding META construct (discouraged)),
	HTML allows language tagging of any element.
	I.e. you can not only say <HTML lang="i-sami-no">,
	but also <SPAN lang="i-sami-no">
	Because Email is not structured text, this would be
	very difficult to do for email (unless you send it
	as HTML, of course!).

- In RFC 1766, language tags are indivisible. RFC 2070 and
	HTML 4.0 allow language tags to be viewed as a hierarchy,
	so that e.g. text tagged en-us can be matched with
	hyphenation rules tagged en.

Regards,	Martin.

Received on Wednesday, 19 November 1997 05:02:34 UTC