- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Tue, 22 Mar 2005 22:40:03 +0200 (EET)
- To: Michel CARRARE <mc@michelcarrare.com>
- Cc: www-validator@w3.org
On Tue, 22 Mar 2005, Michel CARRARE wrote: > I have a little problem with character encoding. One of my web pages: > > http://www.michelcarrare.com/multimedia/table-car.php > > contains a table of all 8-bit characters. It contains incorrect information. There is a rich supply of tables of "8-bit characters", some of them correct, some not. I wouldn't mention this (after all, we all try to reinvent the wheel at times), but it is directly connected with the validation problems. > When validating this page, I have > warnings coresponding to reserved characters, which is absolutely normal. No, the warnings are about character references like €, which are technically _undefined_ (not reserved). And the warnings are indeed useful. Here they imply that the page contains bogus information. Whatever gets rendered when you use € is just error processing by a browser. > Here is my problem. I thought only characters from 128 to 159 were > reserved. They are not reserved. And character encoding is not the issue here. The reference € is undefined, no matter what the encoding is. > But, apparently, the validator sends me warnings for characters > from 127 to 159. Could anyone tell me if character 127 is reserved or not. > I could not find this information. I mean, I found both answers! The authoritative answer is in the SGML declaration for HTML 4.01: DESCSET 0 9 UNUSED 9 2 9 11 2 UNUSED 13 1 13 14 18 UNUSED 32 95 32 127 1 UNUSED 128 32 UNUSED 160 55136 160 55296 2048 UNUSED -- SURROGATES -- 57344 1056768 57344 http://www.w3.org/TR/html4/sgml/sgmldecl.html Thus, code position 127 is UNUSED in the document character set (which does _not_ depend on the character encoding you use), and hence  is undefined too. What puzzles me is this: When I tried to validate your page using the extended interface (to get the source listed with line numbers), http://validator.w3.org/check?uri=http%3A%2F%2Fwww.michelcarrare.com%2Fmultimedia%2Ftable-car.php&charset=%28detect+automatically%29&doctype=%28detect+automatically%29&ss=1&verbose=1 I get just "This Page Is Valid HTML 4.01 Transitional!" with no warnings! Apparently this interface switches off the warnings. But there isn't even any obvious way to switch them on there. (Clarification: The page is valid, i.e. does not contain any reportable markup error, but it is seriously wrong still. Using an undefined character reference is all wrong especially on the Web. It's like using 0/0 in mathematics: it is a syntactically correct expression but lacks defined meaning, and anything may happen.) -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Tuesday, 22 March 2005 20:40:37 UTC