- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Tue, 26 Dec 2006 22:15:44 +0200
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- Cc: www-archive@w3.org
On Dec 19, 2006, at 11:55, Bjoern Hoehrmann wrote: > You might want to read my notes on the subject. They do not propose > any > particular format, but list requirements and problems to be solved > by a > format of this kind, see <http://esw.w3.org/topic/MarkupValidator/ > M12N>. "The current validator supports multiple input sources, file upload, textarea, and retrieval of remote resources. An observation is naturally bound to the input retrieved through these sources (or their metadata) and should thus be identified in the observation instance." I don't see why the source needs to be identified. Surely the client invoking the checker knows what it sent as the input. (Associating the URI of the entity with a source line and column is subtly different from merely echoing things about the input that the provider of the input already knows.) "The descriptor should be extensible to allow for different location addressing schemes" Then consumers of the format would need to support different addressing schemes. "A related question is how the results would be presented in the XHTML interface, it could be a hierarchy like" "Well-formedness errors:" "DTD-Validitiy errors:" "Link Check" Since off-the-shelf libraries don't usually categorize errors like that, introducing such categorization as an afterthought could well go into the territory of diminishing returns, because the cost of introducing categorization would be great compared to the benefit. For example, the SAX2 ErrorHandler interface doesn't guarantee that a report of an "error" carries any data beyond stating that an error occurred. In practice, an English-language message is available. Most often also an approximate source location is available. Extracting any more data than that generally requires hacking into the off-the- shelf libraries and subverting the usual reporting mechanism. "bla bla ... branding ... outreach ... community ... positive statements ... terminology ..." :-) > In the context of that document, the need for a common format came > from > the desire to enable multiple independent tools to combine the results > at low and high levels, for example, to combine multiple "microformat" > checkers with a general-purpose XHTML Validator. If I were to integrate a microformat checker with my validation service, I'd prefer to integrate them in-process. That is the checkers would need to consume SAX2 ContentHandler events and report to a SAX2 ErrorHandler. Of course, such an arrangement would require the checkers to be written in Java. The primary use case for the Web service format that I am considering is allowing e.g. a blogging system to send a document off to a Web service for checking so that the blogging system doesn't need to contain an in-process conformance checker. > Also note that the ISO 19757-3 (Schematron) specification defines a > reporting format. The way I have seen Schematron used (which is also how I use it myself) makes whether an error check is implemented as a failing assertion or as a succeeding report an implementation detail which shouldn't be exposed to end users or even software observers outside the Schematron engine. I think I'll patch my copy of Jing/oNVDL at some point to hide whether a message was generated by a failed assertion or a report. Moreover, of late, I have started to consider the Schematron of the HTML5 conformance checker a mere rapid prototype of a hand-crafted more CPU-efficient and more memory-efficient exclusion and referential integrity checker. Thank you for the pointers. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Tuesday, 26 December 2006 20:40:39 UTC