- From: Elliotte Harold <elharo@metalab.unc.edu>
- Date: Sun, 05 Nov 2006 07:50:04 -0500
?istein E. Andersen wrote: > I perfectly agree. (Actually, i think that U+7F (delete) and the C1 control characters > should be excluded [transformed into U+FFFD] as well, but this could perhaps be > problematic due to spurious CP1252 characters.) Spurious Cp1252 is a real problem. In fact, incorrectly labeled encoding is a real problem, and a thorny one. Draconian error handling in XML solves this, but I'm not sure what HTML 5 should do here. It's worth thinking about though. It's also worth reviewing the work the W3C TAG and I18N working groups did on this issue since a lot of smart people did a lot of thinking about this quite recently: http://www.w3.org/2001/tag/doc/mime-respect-20060412 http://www.w3.org/TR/charmod/ I don't remember the exact outcome myself, except that it's a really ugly problem that truly requires some changes in what options webmasters give to web content creators. -- ?Elliotte Rusty Harold elharo at metalab.unc.edu Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Received on Sunday, 5 November 2006 04:50:04 UTC