- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Sun, 30 Oct 2005 15:02:43 +0200 (EET)
- To: www-validator@w3.org
- Cc: Naturally Naomi <naturallynaomi@yahoo.com>
- Message-ID: <Pine.GSO.4.63.0510301441550.10081@korppi.cs.tut.fi>
On Sun, 30 Oct 2005, Jukka K. Korpela wrote: > I created a trivial test document > http://www.cs.tut.fi/~jkorpela/test/nbsp.html > that has a <ul> element with one <li> element inside it but > with a no-break space before the <li> tag. Here's what the > W3C validator says: > > 1. Error Line 5 column 0: start tag for "LI" omitted, but its declaration > does not permit this. > ¼/strong>?<li></li> > > There's something very strange in the report's source. I was able to reduce the problem to an even more trivial case: - write a document in ISO-8859-1 encoding - declare HTML 4.01 Strict DOCTYPE - use a body part of <body>é</body> (or with any non-ASCII character inside the body) The validator reports "character data is not allowed here", which is correct, but shows the element oddly: <body>ü/strong>?</body> If I manually change the encoding of the report page to ISO-8859-1, I get: <body>é</body> This is still wrong, but I guess we can now see what goes wrong. Here's the source of the error message page (viewed as if it were Latin 1): <li class="msg_err"> <span class="err_type">Error</span> <em>Line 4 column 6</em>: <span class="msg">character data is not allowed here</span>.<pre><code class="input"><body><strong title="Position where error was detected.">Ã</strong>©</body></code></pre> Thus, the validator has added <strong> markup in a manner that breaks a sequence of two octets that is meant to be the UTF-8 representation of a single character ("é" in this case). This produces the octet pair C3 3C (looks like Ã< if interpreted as ISO-8859-1), and the rest is a mess. Deactivating the generation of <strong> markup to highlight the point of error would be a quick fix. -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Sunday, 30 October 2005 13:02:52 UTC