W3C home > Mailing lists > Public > www-validator@w3.org > March 2013

Re: escaped & not validated correctly

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Mon, 04 Mar 2013 00:55:03 +0200
Message-ID: <5133D4C7.2040206@cs.tut.fi>
To: Markus Schicketanz <markus@schicketanz.com>
CC: www-validator@w3.org
2013-02-28 17:33, Markus Schicketanz wrote:

>> Error Line 64, Column 31: & did not start a character reference. (&
>> probably should have been escaped as &amp;.)
>>
>> http://www.kath-zeitz.de/?mm=1&amp;me=8</div>
>>
> obviously there is no unescaped &.

After Markus posted a complete document to me, I was able to confirm 
that there is a bug in the validator. It is apparently in the parsing of 
textarea element content, and it can be demonstrated with the following:

<!doctype html><title></title>
<textarea cols='18' rows='4'>&me</textarea>
&you

The erroneous reference &you is correctly reported with line number, but 
the reference &me is reported without line number and without echoing 
the source code fragment:

QUOTE
Error & did not start a character reference. (& probably should have 
been escaped as &amp;.)

Error Line 3, Column 1: & did not start a character reference. (& 
probably should have been escaped as &amp;.)

&you
UNQUOTE

In Markus' case, for some reason, the error is reported as delayed and 
as relating to a line that does not contain an error. I have not been 
able to isolate this feature of the bug to a simple case.

The bug is also present at http://validator.nu.

Regarding the authoring side of the matter, fixing the error inside a 
textarea removes the symptoms. The real error in the HTML document 
tested is an unencoded "&" inside a textarea element. The content of 
that element is parsed as plain text in the sense that no HTML tags 
except the end tag of the element are recognized, but character 
references (to use the HTML5 term) *are* recognized and an ampersand as 
data character, when followed by a name character, must be escaped.

The validator correctly parsed textarea element content by the rules (as 
RCDATA, in HTML5 terms), but it fails to report undefined references 
properly.

Yucca
Received on Sunday, 3 March 2013 22:55:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 3 March 2013 22:55:38 GMT