Validator bugs/Improvements

From: Martin J. Duerst (duerst@w3.org)
Date: Wed, Sep 22 1999


Message-Id: <199909220801.RAA22596@sh.w3.mag.keio.ac.jp>
Date: Wed, 22 Sep 1999 17:12:02 +0900
To: gerald@w3.org, www-validator@w3.org
From: "Martin J. Duerst" <duerst@w3.org>
Cc: mark.davis@us.ibm.com (Mark Davis), mimasa@w3.org, asada@w3.org
Subject: Validator bugs/Improvements

Hello Gerald,

Here are some proposals for improving our validator.

The UTF-8 problem is noted on http://validator.w3.org/todo.html
and has been reported earlier. It is rather urgent, as we plan
to publish a WD in UTF-8 soon, and also because accepting XHTML
(as XML) requires that UTF-8 and UTF-16 are accepted.

I think the easiest way to fix this is to upgrade to SP 1.3,
which does most of this stuff internally; Takuya's code would
then just be used to set the right parameter.

> Date: Mon, 20 Sep 1999 06:54:53 -0700
> From: Mark Davis <mark@macchiato.com>
> X-Mailer: Mozilla 4.6 [en] (Win98; U)
> X-Accept-Language: en,de-CH,fr-CH,it
> To: Martin Duerst <duerst@w3.org>, Mark Davis <mark.davis@us.ibm.com>
> CC: Misha Wolf <misha.wolf@reuters.com>
> Subject: Validator bug
> X-UIDL: dc29267a3cebe94223915604c1bd0fe7
> 
> Martin,
> 
> I was using validator recently, and again found that it did not accept
> UTF-8 characters. Can you ask the person responsible to fix that?
> 
> Also, there are two other items it would be good to add:
> 
> 1. As a practical matter, the validator should put out a warning if the
> page does not have a charset tag. While the explicit charset is not
> required, essentially almost every browser is set to a default charset
> which is not Latin-1. While for Western Europe that is 1252 - which is
> pretty harmless - other settings will often misinterpret the contents of
> the page.

This is indeed very true. In the warning, you could for example point to
http://www.w3.org/International/O-HTTP-charset.html,
which I just updated a bit.


> 2. The HTML that is actually checked is different than the source;
> various stuff is inserted or changed (like the DOCTYPE). While that is
> ok, it does cause the line numbering to be wrong, and can cause
> confusion when the line it says needs to be fixed does not resemble the
> source. It would be  good to:
> a. Not include inserted lines in the line count.
> b. Color the insertions so people know a change is made.
> 
> Mark


Regards,   Martin.


#-#-#  Martin J. Du"rst, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org