- From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- Date: Tue, 08 Nov 2005 01:50:26 +1100
- To: Sverker Fridqvist <sverker@fridqvist.se>
- CC: www-validator@w3.org
Sverker Fridqvist wrote: > Compare the error reports for these two urls: > > http://sverker.fridqvist.se/test/withutf8.php This one sends the HTTP header: Content-Type: text/html; charset=utf-8 > http://sverker.fridqvist.se/test/withiso8859.php This one sends: Content-Type: text/html; charset=iso-8859-1 > Both files contain Byte-Order Marks (BOMs) designating UTF-8 encoding. No, the first contains the BOM (U+FEFF) encoded in UTF-8, the second contains the characters "" encoded as ISO-8859-1, which just happens to be using the same octets as the UTF-8 BOM. The chances are that the author intended this to be the UTF-8 BOM, but the authoritative HTTP headers state otherwise. > The BOM is recognized for the first file, but not for the second one. Correct. > It would be helpful if the validator recognized the BOM also in the > second case, and reported that the not-allowed characters in the prolog > is a BOM. The problem is that determining that it is the UTF-8 BOM would require ignoring the fact that the document needs to be parsed as ISO-8859-1, or whatever other encoding is declared. > If this is not possible, or easily done, the error message could make a > helpful hint towards a BOM: > > "Character ... not allowed in prolog. The character may be part of a > Unicode Byte-Order Mark (BOM). Try changing the character encoding > setting of your editor to not include BOMs." > ] Better yet, tell them to configure their server to send the correct character encoding information. -- Lachlan Hunt http://lachy.id.au/
Received on Monday, 7 November 2005 14:50:44 UTC