- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Mon, 05 Dec 2011 22:18:10 -0500
On 12/5/11 9:55 PM, Leif Halvard Silli wrote: > If that is all they tested, then I'd said they did not test enough. That's normal for the web. >> (For the record, reading a particular page in a language is a much >> simpler task than reading the language; I can't "read German", but I can >> certainly read a German subway map.) > > Or Polish subway map - which doesn't default to said encoding. Indeed. I don't think anyone thinks the existing situation is all fine or anything. > I said I agreed with him that Faruk's solution was not good. However, I > would not be against treating<DOCTYPE html> as a 'default to UTF-8' > declaration This might work, if there hasn't been too much cargo-culting yet. Data urgently needed! >> Not unless we change the authoring tools. Half the time these things >> are just directly exported from a word processor. > > Please educate me. I'm perhaps 'handicapped' in that regard: I haven't > used MS Word on a regular basis since MS Word 5.1 for Mac. Also, if > "export" means "copy and paste" It can mean that, or "save as HTML" followed by copy and paste. > then on the Mac, everything gets > converted via the clipboard On Mac, the default OS encoding is UTF-8 last I checked. That's decidedly not the case on Windows. >>> OK: Quotation marks. However, in 'old web pages', then you also find >>> much more use of HTML entities (such as?) than you find today. >>> We should take advantage of that, no? >> >> I have no idea what you're trying to say, > > Sorry. What I meant was that character entities are encoding > independent. Yes. > And that lots of people - and authoring tools - have > inserted non-ASCII letters and characters as character entities, Sure. And lots have inserted them "directly". > At any rate: A page which uses > character entities for non-ascii would render the same regardless of > encoding, hence a switch to UTF-8 would not matter for those. Sure. We're not worried about such pages here. -Boris
Received on Monday, 5 December 2011 19:18:10 UTC