- From: Chris Lilley <Chris.Lilley@sophia.inria.fr>
- Date: Thu, 25 Jul 1996 14:02:14 +0200 (DST)
- To: "M.T. Carrasco Benitez" <carrasco@innet.lu>, www-international@w3.org
On Jul 25, 1:18pm, M.T. Carrasco Benitez wrote: > This is with the intention of putting together the last few messages: > > - Only one charset in allowed per document. yes. > - What SHOULD be the default "document character set" for HTML ? > Latin1, Unicode ... ? ISO 10646 is the only (not default) document character set for HTML 2.0 and subsequent versions. > - How should be view: > + Many "document character sets" are allowed; e.g., ISO-8859-1, ISO-8859-7. Many character encodings, identified by the text/* charset parameter, are allowed (some may be more widely used than others, for interoperability it may be desirable to pick a single choice if there are several options with the same character repertoire). There is only ever one document character set for HTML. > + Only (full 32 bits) 10646 is allowed. The others are subsets. As the document character set, yes. How much of the 00 01 xx xx codespace has been used in 10646 ? > - The charset for transmission SHOULD be whatever is appropriate for the data. yes. But using a less appropriate charset is not wrong, just undesirable. A French document could be transmitted in KOI-8 with all accented characters expressed as HTML 2.0 entities, for example. > - What is appropriate for the data ? > The client does not express any desire/restriction and the document is in > the server in ISO-8859-7. Should the server send it in ISO-8859-7 or > in Unicode ? 8859-7, because it is shorter (8bit characters). > - The server: "SHOULD or MUST ?" inform the client of the character set. Must. If unspecified, 8859-1 must be assumed. Otherwise, clients must have all sorts of tricky code to try and guess what the charset would have been. > - LANG is for higher functions, such as short quotations, etc. > - There is no association between LANG and charset. Yes, it is independent of charset but may for example be used to select a font with the appropriate glyph repertoire. > - The server SHOULD inform the client with Content-Language. Yes, although arguably LANG on HTML or BODY provides equivalent information. How should multilingual documents be labelled in the Internet media type? For example a document with parallel French and Urdu text - there is no "major" language. > - LANGs in the document overrides the Content-Language. Yes, since unless they are on the HTML element they apply to a more specific part of the document. -- Chris Lilley, W3C [ http://www.w3.org/ ] http://www.w3.org/people/chris/ INRIA/W3C chris@w3.org 2004 Rt des Lucioles / BP 93 +33 93 65 79 87 06902 Sophia Antipolis Cedex, France
Received on Thursday, 25 July 1996 08:06:13 UTC