- From: Terje Bless <link@tss.no>
- Date: Sat, 28 Apr 2001 04:11:23 +0200
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- cc: www-validator@w3.org
On 28.04.01 at 03:42, Bjoern Hoehrmann <derhoermi@gmx.net> wrote: >Only HTML 4.0 and later make this restriction. I very much would like to avoid special casing on HTML version. >We have a major conflict between HTTP/1.1 and HTML 4.0 here; Where was www-qa when we needed them... :-) >[SNIP "META" kludge] I think this is just horrible and finding a correct >_and_ usable solution is impossible. Agreed. >I think the best thing we can (and should) do is > > * report a warning if there is no charset parameter in the HTTP > response Someone should write good docs on charsets, problems with them, and help in selecting and specifying a proper one. HOWTO links for Apache and IIS. Making this be a warningable -- :-) -- state is problematic insofar as ciwah et al rams "Latin1 is the default" down people's throats and people get confused when it produces a warning (been there, done that). A link to good docs might alleviate this, but this is not something I'm willing to take any action on until I've checked with Gerald. > * report a warning if there is (in addition) no charset parameter in > "the" [1] <meta http-equiv='Content-Type' content='...'> content > type declaration > * report a warning if those two are given and don't match This is Status Quo. > * use ISO-8859-1 if none of them is given Ditto, but this follows from the assumption on the semantics of the HTTP/1.1 Content-Type field. If we change those we'll have to change this code too. > * report an error if the content doesn't match the declared encoding > > sub is_valid_us_ascii {[...]} > sub is_valid_utf8 {[...]} > sub is_valid_latin1 {[...]} > sub is_valid_windows_1252 {[...]} > >I don't know how SP handles invalid input, maybe we can use it to >perform some of these tasks. While those regexes impressed the hell out of me -- :-) -- I don't like this solution. It makes us become an authorative reference on charset issues and maintaining provably correct implementations of these checks. If I can get SP to do it (e.g. barf on "illegal" bytes in "this" encoding), I'd much prefer that. Next alternative is to get Text::Iconv or another CPAN module to do it (Map8?). Final fallback would be to stuff your code into a module and nag on you until you released it to CPAN. :-) I'm going to experiment a bit with SP and see what it can do for us. With any kind of luck it'll do the trick. The big problem is that we're converting everything to UTF-8 internally, so by the time it gets to SP it's too late. The exceptions are US-ASCII and ISO-Latin-1 who get special treatment.
Received on Friday, 27 April 2001 23:51:20 UTC