- From: Jukka K. Korpela <jukkakk@gmail.com>
- Date: Thu, 22 Aug 2019 16:17:59 +0300
- To: W3C WWW Validator <www-validator@w3.org>
- Message-ID: <CAGHxYa6vewGy9SK+PQY4PkX+aqgmZ1Gkq=Da=HNSatnAvWuOgg@mail.gmail.com>
When a page declared to be US-ASCII encoding but actually containing bytes outside the US-ASCII range is submitted to the validator, it reports: “Sorry, I am unable to validate this document because on line *1* it contained one or more bytes that I cannot interpret as us-ascii (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication. The error was: Modification of a read-only value attempted” The report is correct in the sense that it properly indicates the type of error, but it is incorrect and unhelpful when it refers to line 1, independently of the location of the (first) erroneous byte. In general, the validator reports this class of errors with correct line number reference and with information about the offending byte (in hex.), which helps a lot. The last line of the message probably reflects some internal error in the validator. A simple example: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <title>Ascii</title> Intentionally non-Ascii: ü. Also available at http://jkorpela.fi/test/ascii.htma (served with the HTTP header Content-Type: text/html; charset=us-ascii) Just in case the problem looks irrelevant: there can be reasons to use US-ASCII and declare it for an HTML document, for example because the document also needs to be processed with software that can only handle US-ASCII. Since HTML provides ways to represent all character data using just US-ASCII at the character encoding level, it should be supported. And the validator would be a valuable too in checking, among other things, that the data is indeed just US-ASCII, with useful information about the first occurrence when it is not. Jukka “Yucca” Korpela
Received on Thursday, 22 August 2019 13:18:35 UTC