[Bug 4547] Correct line for non utf-8 characters not flagged in .8.0 beta 1

http://www.w3.org/Bugs/Public/show_bug.cgi?id=4547

           Summary: Correct line for non utf-8 characters not flagged in
                    .8.0 beta 1
           Product: Validator
           Version: 0.8.0b1
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Parser
        AssignedTo: link@pobox.com
        ReportedBy: d4d8n7m02@sneakemail.com
         QAContact: www-validator-cvs@w3.org


The current live version gives me this error on a document I uploaded:

Sorry, I am unable to validate this document because on line 222 it contained
one or more bytes that I cannot interpret as utf-8 (in other words, the bytes
found are not valid values in the specified Character Encoding). Please check
both the content of the file and the character encoding indication.

But the new version at http://validator-test.w3.org/ says:

Sorry, I am unable to validate this document because on line 0 it contained one
or more bytes that I cannot interpret as utf-8 (in other words, the bytes found
are not valid values in the specified Character Encoding). Please check both
the content of the file and the character encoding indication.

The difference in line numbers indicates a problem, but in addition to that I
don't see what character is off.  I looked at line 222 in Notepad++ with "show
all characters" mode and I didn't see any characters that shouldn't be there.

Is   not allowed in an anchor element contents?

Line 222 is:

                            <a id="link1" class="Edit" title="Edit
Edit Something" href="#">Edit&nbsp;Something</a>

doctype and namespace:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" >

Received on Tuesday, 8 May 2007 17:32:41 UTC