[Bug 4904] validator bug and inconsistent behavior with: <div>:1:&nbsp;</div>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=4904

           Summary: validator bug and inconsistent behavior with:
                    <div>:1:&nbsp;</div>
           Product: Validator
           Version: HEAD
          Platform: PC
               URL: http://www.manuelmoser.de/stuff/validator/notvalid1.html
        OS/Version: Windows 2000
            Status: NEW
          Severity: normal
          Priority: P2
         Component: check
        AssignedTo: dave.null@w3.org
        ReportedBy: w3bugzilla@manuelmoser.de
         QAContact: www-validator-cvs@w3.org


I think we found a serious bug in the validator. According to my
understanding the following document is valid:

-------------------------------------------
http://www.manuelmoser.de/stuff/validator/notvalid1.html
-------------------------------------------
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>blah</title>
</head>
<body>
<div>:1:&nbsp;</div>
</body>
</html>
-------------------------------------------

But the validator complains about the document with the following
message:

-------------------------------------------
Validation Output: 1 Error
 Line 1, Column 14: &nbsp;</div>.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/x
-------------------------------------------


Some research has shown, that the line number seems to be very random
for this error (depending on the code). I even got an error in line 34
in an document with 18 lines of code. The third line mostly refers to
a random line, or to some tags after the line, where I see the
problem. This gave me some hints and I was able to reduce the problem
to this line: 

<div>:1:&nbsp;</div>

Important for the error seems to be the fact that there a two colon
and a number in between, followed by a &nbsp;

Some examples:

Removing the number makes the document valid:

-------------------------------------------
http://www.manuelmoser.de/stuff/validator/valid1.html
-------------------------------------------
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>blah</title>
</head>
<body>
<div>::&nbsp;</div>
</body>
</html>
-------------------------------------------

Changing the number to a letter makes the document valid:

-------------------------------------------
http://www.manuelmoser.de/stuff/validator/valid2.html
-------------------------------------------
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>blah</title>
</head>
<body>
<div>:a:&nbsp;</div>
</body>
</html>
-------------------------------------------

Removing the &nbsp; makes the document valid:

-------------------------------------------
http://www.manuelmoser.de/stuff/validator/valid3.html
-------------------------------------------
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>blah</title>
</head>
<body>
<div>:1:</div>
</body>
</html>
-------------------------------------------

Manuel Moser

Received on Wednesday, 1 August 2007 18:41:17 UTC