- From: Mikko Rantalainen <mira@cc.jyu.fi>
- Date: Sat, 21 Feb 2004 08:11:02 -0500 (EST)
- To: WWW Style <www-style@w3.org>
Bert Bos / 2004-02-21 00:26: > This problem of finding the encoding of a file is complicated, not > just because it is so hard to imagine for spec writers and programmers > what a program actually sees when the encoding is wrong, but also for > other reasons: > > - Most HTTP servers don't send the charset param, we're not going to > change that overnight. > So, if we assume that we can change the browsers in time, what do we > want in CSS3? I'd say this: I would suggest following: 1) If HTTP header defines character set, then use it 2) If HTTP header doesn't define character set, use UTF-8. (no more rules) However, for historical documents (that is, majority of the documents in the web already) I think the recommended behaviour of the user agent would be to ask the user what to do, in case the "character set is UTF-8 unless explicitly told otherwise" assumption results to invalid byte sequences. Perhaps recommend displaying a dialog of some kind that has some kind of interface to modify character sets of *all documents* (html, css, javascript) missing the explicit charcter set in HTTP headers. Make the user agent explain that the problem is because the page author doesn't follow standards and the user agent needs advice to be able to represent the content correctly. This should be the default behavior, some user agents may allow opt-in to automagic guess mechanism which may or may not work. UTF-8 can represent every character anybody needs so that doesn't cause problems to you as a document author in case you cannot fix the HTTP header. Just transcode from your current character set to UTF-8. Shouldn't be a problem while authoring NEW documents. As for the historical documents, I think the spec could include informal section explaining some common problems contained in old documents. Supporting automagic charset selection that overrides the above rules 1) and 2) should be optional. Recommend reporting the problem to the user and asking for more advice instead. The more we can make the document author feel that he gets all the blame, the faster he'll fix the document. If he doesn't care, it might be that the document isn't worth reading anyway. Blame the author of broken document, not the user. Changing everything to UTF-8 is going to be painful process, no matter how you do it. I rather take more pain for a little time than the other way around. -- Mikko
Received on Saturday, 21 February 2004 09:16:10 UTC