- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Sun, 30 Oct 2005 15:02:43 +0200 (EET)
- To: www-validator@w3.org
- Cc: Naturally Naomi <naturallynaomi@yahoo.com>
- Message-ID: <Pine.GSO.4.63.0510301441550.10081@korppi.cs.tut.fi>
On Sun, 30 Oct 2005, Jukka K. Korpela wrote:
> I created a trivial test document
> http://www.cs.tut.fi/~jkorpela/test/nbsp.html
> that has a <ul> element with one <li> element inside it but
> with a no-break space before the <li> tag. Here's what the
> W3C validator says:
>
> 1. Error Line 5 column 0: start tag for "LI" omitted, but its declaration
> does not permit this.
> ¼/strong>?<li></li>
>
> There's something very strange in the report's source.
I was able to reduce the problem to an even more trivial case:
- write a document in ISO-8859-1 encoding
- declare HTML 4.01 Strict DOCTYPE
- use a body part of <body>é</body> (or with any non-ASCII
character inside the body)
The validator reports "character data is not allowed here",
which is correct, but shows the element oddly:
<body>ü/strong>?</body>
If I manually change the encoding of the report page to ISO-8859-1, I get:
<body>é</body>
This is still wrong, but I guess we can now see what goes wrong.
Here's the source of the error message page (viewed as if it were
Latin 1):
<li class="msg_err">
<span class="err_type">Error</span>
<em>Line 4 column 6</em>:
<span class="msg">character data is not allowed
here</span>.<pre><code class="input"><body><strong title="Position
where error was detected.">Ã</strong>©</body></code></pre>
Thus, the validator has added <strong> markup in a manner that breaks
a sequence of two octets that is meant to be the UTF-8 representation
of a single character ("é" in this case). This produces the octet pair
C3 3C (looks like Ã< if interpreted as ISO-8859-1), and the rest is
a mess.
Deactivating the generation of <strong> markup to highlight the point
of error would be a quick fix.
--
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Sunday, 30 October 2005 13:02:52 UTC