- From: Dmitry Turin <html60@narod.ru>
- Date: Wed, 18 Jul 2007 08:20:08 +0300
- To: public-html@w3.org
ST> Is there any particular reason why you're relying on UAs to guess what ST> character repertoire the document is in? ST> But I see no reason for authors to rely on ST> UAs to just magically guess the correct character repertoire. RB> Servers rarely include a charset RB> header and that might be a good thing, because those would likely be RB> often wrong too. AF> It is an author's error to publish document without AF> providing information of what encoding is used in it. Guessing is not in deal. Purpose is to give possibility to user to change encoding manually in browser menu and follow along anchors. Let's enter terms: 'falling of encoding', which means, that browser show document as writed in other encoding, than document is; 'anchor falling', which means, that 'falling in encoding' occurs in new document, after user has followed along <a href> in previous document. I met three case with anchor falling: (1) at serfing in documents on server (1.1) new document does not contain frames, i.e. is a single document (1.2) anchor falling occurs in frame (2) at serfing in documents on local file system after downloading of site - anchor falling occurs, because <meta content="text/html; charset="> and real encoding differ each other. In case of #1.1 user is forced to use browser menu in each next document, in case of #1.2 he cann't change encoding in frame (except to save frame paper in local filesystem and to open saved file), in case of #1.3 he is forced to convert files in directory and subdirectories recursively by additional program. Given shows, that this problem should exist in all alphabets, letters of which have codes 128-255. As to my site, i don't use frames (#1.2), and #2 is prevented by accessibility of archive of site. Thus only #1.1 (conflict between actual encoding, 'Content-Type' and <meta content="text/html; charset=">) can threaten me. I decided, that free hosting will be enough to show documents for discussion. This means, that papers have increased probability of anchor falling. --- What's about guessing algorithm to improve today's browsers, maybe there is reason to borrow it from russian text editors, which auto-detect encoding. Statistically task is relieved, that only two of five encodings are used in practice ('windows' and 'koi-8' are used; 'dos', 'iso', 'unicode' are not used).
Received on Wednesday, 18 July 2007 12:46:17 UTC