- From: Robert Burns <rob@robburns.com>
- Date: Fri, 13 Jul 2007 15:22:56 -0500
- To: Andrew Fedoniouk <news@terrainformatica.com>
- Cc: <public-html@w3.org>, "Sander Tekelenburg" <st@isoc.nl>
On Jul 13, 2007, at 2:00 PM, Andrew Fedoniouk wrote: > > > ----- Original Message ----- From: "Sander Tekelenburg" <st@isoc.nl> > >> At 08:19 +0300 UTC, on 2007-07-13, Dmitry Turin wrote: >> >>> Good day, Robert. >>> >>> RB> I was wondering what character encoding you use to serve up >>> this page: >>> RB> <http://html60.chat.ru/site/html60/ru/index_ru.htm> >>> RB> We're trying to conduct some tests on current UAs and this >>> page might >>> RB> be helpful. Do you know what charset it uses? >>> >>> All pages in russian language are coded in WIN-1251. >>> These documents are displayed truely both in IE and Opera. >> >> Only because they happen to guess what you intend. They're not >> presented as >> you intend in iCab3.0.3, Firefox2.0.0.4, Safari2.0.4 (because >> neither the >> server nor the document itself say what character repertoire the >> document is >> in). > > Sander, that is just a bug. I couldn't tell what you were referring to as "a bug" in the above text. Could you elaborate? > HTML documents in Russian must indicate encoding. > This particular page will work in IE and only on Russian version > of Windows OS as in case of unknown encoding IE uses current > system encoding settings (So called "current ANSI code page"). This is just one anecdotal example of why I think HTML5 should provide much more author guidance (and probably UA guidance) on character encodings. Its something many authors do not understand. I also think its the kind of detail that most authors probably shouldn't need to understand if it was handled properly in existing tools and existing standards. Since so much authoring goes on by simply copying code, authors end up copying meta tags that express completely incorrect encodings. Servers rarely include a charset header and that might be a good thing, because those would likely be often wrong too. Given that' its not really handled well, I think we should do something. I think BOMs are the best way to go, but obviously they don't work with everything (and not all tools support them either). Even better would be a byte sequence registry or something like that, but that's way outside the scope of our WG. Anyway, its worth further testing and its worth considering ways HTML5 might address the problem. Perhaps all we can do is push authors to use the Unicode encodings more (and that means authoring tools need to have proper support too). We don't really help encoding related security issues, by ignoring the problem. Take care, Rob
Received on Friday, 13 July 2007 20:23:04 UTC