Re: A simpler Web service response format from Henri Sivonen on 2006-12-26 (www-archive@w3.org from December 2006)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Tue, 26 Dec 2006 22:15:44 +0200
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: www-archive@w3.org
Message-Id: <52C7974B-2371-4C20-91C6-97636A971BA3@iki.fi>
On Dec 19, 2006, at 11:55, Bjoern Hoehrmann wrote:

> You might want to read my notes on the subject. They do not propose  
> any
> particular format, but list requirements and problems to be solved  
> by a
> format of this kind, see <http://esw.w3.org/topic/MarkupValidator/ 
> M12N>.

"The current validator supports multiple input sources, file upload,  
textarea, and retrieval of remote resources. An observation is  
naturally bound to the input retrieved through these sources (or  
their metadata) and should thus be identified in the observation  
instance."

I don't see why the source needs to be identified. Surely the client  
invoking the checker knows what it sent as the input.

(Associating the URI of the entity with a source line and column is  
subtly different from merely echoing things about the input that the  
provider of the input already knows.)

"The descriptor should be extensible to allow for different location  
addressing schemes"

Then consumers of the format would need to support different  
addressing schemes.

"A related question is how the results would be presented in the  
XHTML interface, it could be a hierarchy like"
"Well-formedness errors:"
"DTD-Validitiy errors:"
"Link Check"

Since off-the-shelf libraries don't usually categorize errors like  
that, introducing such categorization as an afterthought could well  
go into the territory of diminishing returns, because the cost of  
introducing categorization would be great compared to the benefit.

For example, the SAX2 ErrorHandler interface doesn't guarantee that a  
report of an "error" carries any data beyond stating that an error  
occurred. In practice, an English-language message is available. Most  
often also an approximate source location is available. Extracting  
any more data than that generally requires hacking into the off-the- 
shelf libraries and subverting the usual reporting mechanism.

"bla bla ... branding ... outreach ... community ... positive  
statements ... terminology ..."

:-)

> In the context of that document, the need for a common format came  
> from
> the desire to enable multiple independent tools to combine the results
> at low and high levels, for example, to combine multiple "microformat"
> checkers with a general-purpose XHTML Validator.

If I were to integrate a microformat checker with my validation  
service, I'd prefer to integrate them in-process. That is the  
checkers would need to consume SAX2 ContentHandler events and report  
to a SAX2 ErrorHandler. Of course, such an arrangement would require  
the checkers to be written in Java.

The primary use case for the Web service format that I am considering  
is allowing e.g. a blogging system to send a document off to a Web  
service for checking so that the blogging system doesn't need to  
contain an in-process conformance checker.

> Also note that the ISO 19757-3 (Schematron) specification defines a  
> reporting format.

The way I have seen Schematron used (which is also how I use it  
myself) makes whether an error check is implemented as a failing  
assertion or as a succeeding report an implementation detail which  
shouldn't be exposed to end users or even software observers outside  
the Schematron engine. I think I'll patch my copy of Jing/oNVDL at  
some point to hide whether a message was generated by a failed  
assertion or a report.

Moreover, of late, I have started to consider the Schematron of the  
HTML5 conformance checker a mere rapid prototype of a hand-crafted  
more CPU-efficient and more memory-efficient exclusion and  
referential integrity checker.

Thank you for the pointers.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Tuesday, 26 December 2006 20:40:39 UTC