Re: A simpler Web service response format from Henri Sivonen on 2007-01-30 (www-validator@w3.org from January 2007)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Tue, 30 Jan 2007 17:25:44 +0200
To: olivier Thereaux <ot@w3.org>
Cc: Karl Dubost <karl@w3.org>, www-validator Community <www-validator@w3.org>
Message-Id: <D1C80E33-6812-42C8-9113-DB43D6B8A9D0@iki.fi>

Hi,

Thank you for your reply and sorry about the late follow-up.

On Jan 3, 2007, at 21:36, olivier Thereaux wrote:

> On Dec 18, 2006, at 05:13 , Henri Sivonen wrote:

>>  * Messages are grouped by type (error, warning, misc) instead of  
>> just lumping them together in the order the messages were  
>> generated in the validation process. (The grouping is redundant  
>> and requires buffering.)
>
> This is reminding me of a lot of discussions I've read and had on  
> the subject. I am afraid there are two schools here that will never  
> be reconciled... One (and I would place myself there) prefer to  
> treat problems sequentially, regardless of their importance. The  
> others would rather fix errors first, then warnings. The former  
> will prefer the output of the markup validator, the latter, that of  
> the CSS validator.

I see.

>>  * For each message type, the generator of the messages has to  
>> count the messages and indicate the count before the messages.  
>> (The message counts are redundant data and generating them  
>> requires buffering.)
>
> Having seen how convenient it is for the user to know the number of  
> errors without having to count them, I disagree with you here.

One problem with counting errors is that in many cases the number  
doesn't have a well-defined meaning. For example, the first error may  
be serious enough to cause spurious later errors to be reported  
because the first error changed the state of the validator in a bad way.

> When it comes to counting or sorting or other such processing,  
> there will always be a tension because neither producer nor user of  
> a format wants to use the cpu cycles. In such a context, isn't it a  
> good idea that the producer, not the user, pays this price if the  
> producer really wants the tool to be used?

It isn't really a matter of CPU cycles but whether the errors for  
large documents can start streaming over the network before the  
entire document has been checked. Granted, in many cases it doesn't  
matter in practice.

>>  * The formats echo information that the client already knows such  
>> as the URI of the validator, the URI of the input or in the case  
>> of the Unicorn format, the date.
>>
>>  * The formats have unnecessary telescoping envelope elements. (A  
>> SOAP 1.2 format message ends with </m:markupvalidationresponse></ 
>> env:Body></env:Envelope>, where </env:Body></env:Envelope> is just  
>> cruft.)
>>
>>  * The SOAP format has SOAP namespace cruft. The Unicorn format  
>> has XSD cruft.
>
> I think these are reasonably cheap, especially if the benefit is  
> being processable by more engines (soap-enabled ones, schema-based  
> parsers, etc.).

To me, it seems that it is a design bug for a schema-based parser to  
require schema artifacts in the document instance. What kind of  
interop has been achieved with the W3C Validator SOAP interface and  
off-the-shelf SOAP stacks?

>>  * The formats represent line and column numbers as text content  
>> of elements as opposed to attributes.
>
> I can't parse this. Please explain.

I meant that as markup aesthetic attributes vs. child element content  
and machine readable vs. human readable consideration, the line and  
column numbers could well go in attributes.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Tuesday, 30 January 2007 15:26:03 UTC