Re: LANG + chars

On Jul 25,  1:18pm, M.T. Carrasco Benitez wrote:

> This is with the intention of putting together the last few messages:
>
> - Only one charset in allowed per document.

yes.

> - What SHOULD be the default "document character set" for HTML ?
>   Latin1, Unicode ... ?

ISO 10646 is the only (not default) document character set for HTML 2.0 and
subsequent versions.

> - How should be view:
>   + Many "document character sets" are allowed; e.g., ISO-8859-1, ISO-8859-7.
Many character encodings, identified by the text/* charset parameter, are
allowed (some may be more widely used than others, for interoperability it may
be desirable to pick a single choice if there are several options with the same
character repertoire). There is only ever one document character set for HTML.

>   + Only (full 32 bits) 10646 is allowed.  The others are subsets.
As the document character set, yes. How much of the 00 01 xx xx codespace has
been used in 10646 ?

> - The charset for transmission SHOULD be whatever is appropriate for the
data.
yes. But using a less appropriate charset is not wrong, just undesirable. A
French document could be transmitted in KOI-8 with all accented characters
expressed as HTML 2.0 entities, for example.

> - What is appropriate for the data ?
>   The client does not express any desire/restriction and the document is in
>   the server in ISO-8859-7.  Should the server send it in ISO-8859-7 or
>   in Unicode ?

8859-7, because it is shorter (8bit characters).

> - The server: "SHOULD or MUST ?" inform the client of the character set.
Must. If unspecified, 8859-1 must be assumed. Otherwise, clients must have all
sorts of tricky code to try and guess what the charset would have been.

> - LANG is for higher functions, such as short quotations, etc.
> - There is no association between LANG and charset.
Yes, it is independent of charset but may for example be used to select a font
with the appropriate glyph repertoire.

> - The server SHOULD inform the client with Content-Language.
Yes, although arguably LANG on HTML or BODY provides equivalent information.
How should multilingual documents be labelled in the Internet media type? For
example a document with parallel French and Urdu text - there is no "major"
language.

> - LANGs in the document overrides the Content-Language.
Yes, since unless they are on the HTML element they apply to a more specific
part of the document.




-- 
Chris Lilley, W3C                          [ http://www.w3.org/ ]
http://www.w3.org/people/chris/                       INRIA/W3C
chris@w3.org                       2004 Rt des Lucioles / BP 93
+33 93 65 79 87            06902 Sophia Antipolis Cedex, France

Received on Thursday, 25 July 1996 08:06:13 UTC