- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 03 Jul 2001 23:44:42 +0900
- To: Frank Ellermann <Frank.Ellermann@t-online.de>, Terje Bless <link@tss.no>, Tim Bagot <tsb-w3-validator-0004@earth.li>, Hugo Haas <hugo@w3.org>
- Cc: www-validator@w3.org
Hello Frank, € and ƒ, with or without an explicit charset, is wild abuse. These are not what you mean, independent of the charset. € is an undefined control character. ƒ is NBH (no break here) (see http://www.unicode.org/charts/PDF/U0080.pdf). All numeric character references refer to Unicode, since HTML 2.0, even if some older browsers don't do that correctly. Actually, although strictly speaking, the character numbers in the € - Ÿ range are legal in XML (see http://www.w3.org/TR/REC-xml#NT-Char), I'm thinking about checking them in the validator because using them (as something that they are not) is a very frequent mistake. Regards, Martin. At 11:51 01/07/03 +0200, Frank Ellermann wrote: >Hi Terje, Tim, and Hugo... > >thanks for your answers, now I'll know how to interpret >this kind of check result (and I even managed to create a >form doing this without further typing :-) > >I hope you do like these problems, because here's my next >observation, now it's the XHTML-transitional-validator: > >Trying to find a workaround for € and ƒ with >my (very) old browser I now abuse € and ƒ and >an explicit charset (instead of documenting the abuse). > >The XHTML-check doesn't comment this practice. Later I >needed the same hack in another document, but a bug in >my script generated a DOS-EOF character at the end (hex. >1A, remember ? :-) Of course the validator does not >accept this... but it also complains about the 2nd of 2 >€ and the 1st of 2 ƒ !?! > >After removing the EOF-nonsense: *No errors found. So >a single character can have strange side effects in >other parts of the checked document... intentionally ? > > Bye, Frank
Received on Tuesday, 3 July 2001 23:29:20 UTC