LANG + chars
This is with the intention of putting together the last few messages:
- Only one charset in allowed per document.
- What SHOULD be the default "document character set" for HTML ?
Latin1, Unicode ... ?
- How should be view:
+ Many "document character sets" are allowed; e.g., ISO-8859-1, ISO-8859-7.
+ Only (full 32 bits) 10646 is allowed. The others are subsets.
- The charset for transmission SHOULD be whatever is appropriate for the data.
- What is appropriate for the data ?
The client does not express any desire/restriction and the document is in
the server in ISO-8859-7. Should the server send it in ISO-8859-7 or
in Unicode ?
- The server: "SHOULD or MUST ?" inform the client of the character set.
- Transmissions transformations are for compressing, encrypting
(content-encoding) or "safe transport" (transfer-coding). This is a
lower layer in the transmission. As long as the higher functions are
concerned, they are talking Unicode, Latin1, etc.
- LANG is for higher functions, such as short quotations, etc.
- The server SHOULD inform the client with Content-Language.
- LANGs in the document overrides the Content-Language.
- There is no association between LANG and charset.
- I will do another posting regarding the more advance language