Re: Producing an XML report from the validator from Terje Bless on 2001-06-14 (www-validator@w3.org from June 2001)

From: Terje Bless <link@tss.no>
Date: Thu, 14 Jun 2001 03:17:53 +0200
To: Christian Smith <csmith@barebones.com>
cc: Nick Kew <nick@webthing.com>, Esmond Walshe <esmond.walshe@eeng.dcu.ie>, www-validator@w3.org
Message-ID: <20010614034123-b01010701-1081e74d@192.168.1.6>

On 13.06.01 at 21:02, Christian Smith <csmith@barebones.com> wrote:

>On 6/14/01 at 1:14 AM, link@tss.no (Terje Bless) wrote:
>
>>On 13.06.01 at 13:58, Christian Smith <csmith@barebones.com> wrote:
>>
>>>I don't particularly care what the exact format is, but the pieces of
>>>info that I would need in the report would be:
>>>
>>>entry type         (error, warning, note)
>>
>>All three would be tagged as "error" to begin with I think. We don't
>>distinguish between them at the moment and I'm ambivalent about how, and
>>whether to, do so in the future.
>
>What about the CSS validator? Doesn't this return warnings? I certainly
>don't want you to dilute the validator but it wouldn't hurt to make room
>for this from the start.

I'm not being clear, sorry. I meant that while it's usefull for the report
format (or DTD or whatever) to distinguish between errors, warnings, and
informational messages, you would get all of them labelled as "error" from
the W3C HTML Validator to begin with. We /could/ differentiate between them
-- because SP does -- but I'm not sure what would be the best way to do it
or even if there is any point in doing so.

The format should support all three types in any case.


>>We don't have a meaningfull "end offset", and I'm not quite sure what
>>that would be in any case?
>
>Well, lets take as an example the error report for this line
>
><body fred="jones">
>
>The validator currently reports the offset here
>
><body fred="jones">
>           ^
>I think it would be much better if it could report a start offset here
>
><body fred="jones">
>      ^      
>and an end offset here
>
><body fred="jones">
>                 ^
>
>because I can do something much more useful with this data than with
>what is currently reported.                 

Yes, this would probably be better, but 1) we don't have that data (in the
current parser) and 2) this would have to be decided on a case-by-case
basis; what is the start and end offset of an unclosed tag in HTML <= 4?


>>>In any event the value should be the character offset from the
>>>beginning of the file.
>>
>>Hmmm, it's possible to calculate, but it's a PITA to do. We operate on
>>characters from the beginning of the line (because that's what SP does).
>
>Hmm, given the line number and the offset from the beginning of the line I
>can calculate the offset from the beginning IFF the file is open. It would
>be much easier if the validator could provide this info and this info
>really is important to me. I think the validator would be improved if this
>were added.

Hmmm, ok. I'll look into how much overhead this would incur and add it if
feasible. With the byte vs. char disparity and a few other issues, I'm
afraid it's unlikely in the short term. If/when something SAX-ish enters
the picture this may change; SAX operates on char offsets from the
beginning of the file IIRC.

Received on Wednesday, 13 June 2001 21:41:33 UTC