- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Mon, 18 Dec 2006 12:13:17 +0200
- To: Karl Dubost <karl@w3.org>
- Cc: www-validator <www-validator@w3.org>
On Dec 18, 2006, at 08:42, Karl Dubost wrote: > Le 15 déc. 2006 à 01:01, Henri Sivonen a écrit : >> I had a look at the SOAP and Unicorn response formats for the W3C >> Validator in case I could reuse one of them. They both seemed >> unnecessarily complex. Also, generating the formats requires >> buffering. > > Could you explain what part is complex? > Or what makes their complexity? * Messages are grouped by type (error, warning, misc) instead of just lumping them together in the order the messages were generated in the validation process. (The grouping is redundant and requires buffering.) * The message groups have double containers (errors and errorList). * For each message type, the generator of the messages has to count the messages and indicate the count before the messages. (The message counts are redundant data and generating them requires buffering.) * The formats echo information that the client already knows such as the URI of the validator, the URI of the input or in the case of the Unicorn format, the date. * The formats have unnecessary telescoping envelope elements. (A SOAP 1.2 format message ends with </m:markupvalidationresponse></ env:Body></env:Envelope>, where </env:Body></env:Envelope> is just cruft.) * The formats represent line and column numbers as text content of elements as opposed to attributes. * The SOAP format has SOAP namespace cruft. The Unicorn format has XSD cruft. * The formats require a boolean pass/fail proclamation near the start of the format. (This is redundant and requires buffering.) EARL, which I initially missed, also has problems: * It requires an RDF processing layer on the consumption side. (Unless the consumer cheats, but if the consumer cheated and did not use an RDF processing layer, pretending that RDF is being used would be pointless.) * The graph model and the RDF/XML syntax is mostly overhead when used with validators/checkers that in practice just produce a list of messages and don't care about graphs (let alone merging them). * The concept of "assertor" is unnecessary in the case where the client knows what Web service it is accessing and it is obvious that the assertor is the service. * The concept of a "test criterion" presupposes an implementation strategy that makes it possible for the checker/validator to cite a particular criterion by a well-known URI that is used by different tools for the same criterion. This is looks good in theory, but it doesn't work well in practice unless the checker implementation is based on hand-crafted per-criterion checks *and* the implementation has a mechanism for citing well-known URIs. It turns out that when multiple criteria are embodied in a grammar-based schema, a validation engine cannot cite per-criterion URIs when a particular document tree doesn't have a derivation in the grammar. The EARL output from the W3C Validator illustrates this point rather nicely: It uses "http://www.w3.org/MarkUp/" as an all-encompassing testCase, which defeats the whole point of EARL's granular and inter-tool comparable test case URIs. Also, even when an implementation is assertion-based but uses an off-the-shelf Schematron engine, the tooling likely won't have a mechanism for citing a criterion URI. * EARL has a lot of stuff that is for expressing things that aren't applicable to the Web service use case where the client know what service it is invoking and with what input. * The producers of the reports have too much freedom in expressing things (e.g. different pointer alternatives), so implementing general- purpose EARL consumers becomes hard. On the other hand, implementing a consumer for the EARL subset emitted by a particular service defeats the point of having a spec like EARL in the first place. >> I wrote up a quick format draft, which I may implement in the future: >> http://hsivonen.iki.fi/validator-ws-ideas/#xml > > Interesting. > Could you give an output example? When the input document passes successfully, the output would be: <messages xmlns="http://hsivonen.iki.fi/validator/messages/"></messages> For http://hsivonen.iki.fi/validator/?doc=http%3A%2F%2Fhsivonen.iki.fi %2Ftest%2Fno-space-between-attributes.xhtml the output would be: <messages xmlns="http://hsivonen.iki.fi/validator/ messages/"><info>The Content-Type was “application/xhtml+xml”. Using the XML parser (not resolving external entities).</info><warning line='1' column='109' uri='http://hsivonen.iki.fi/test/no-space- between-attributes.xhtml'>skipping entity: [dtd]</warning><info>Using the preset for XHTML 1.0 Strict based on the root namespace.</ info><error type='fatal' line='7' column='13' uri='http:// hsivonen.iki.fi/test/no-space-between-attributes.xhtml'>need whitespace between attributes</error></messages> > one comment: > I see > The elements in this XML vocabulary are in the namespace “http:// > hsivonen.iki.fi/validator/messages/”. > > Will this namespace survive in the future? I'm just wondering > because there have been cases with troubles related to namespaces > change (for example Atom WG from 0.3 to 1.0)? It is an unimplemented draft, so anything can happen. If I implement the draft, I need to pick a namespace URI and then stick to it. At the moment, the URI quoted above looks like the most likely candidate. (BTW, since software and formats outlive organizations and move between organizations, I think it is a bad idea that namespace URIs and Java package names are supposed to have a domain name in them. This opens up a bikeshedding problem when people are uncomfortable with using a namespace URI or a Java package that has a domain name that is considered non-neutral in it.) > As a side note, I often wonder if sending the line number is always > the best strategy for validation. Line number is very useful > for fixing one file one time. > But as soon as we modify the file, it might change. The CSS > Validator gives two bits of information when possible the line > number and the context. In the case of the CSS validator, the context is the selector, right? Would a markup validator have to extract a piece of source text by having a back door in the parser at the point where the bytes have been decoded into characters but the text is still unparsed? > I think we maybe do mistakes when we characterizing validation by > their results: information, error or warning. A Thing can be > alternatively in error, warning or have an information attached to > it but stays the same thing. Though far to be easy. I think I don't understand what you are saying. However, it did occur to me that io errors, schema loading errors and internal errors, which aren't the fault of the input document, should probably have a separate element (e.g. <incidental>) for them. The presence of one or more such errors could be considered an indeterminate result. (That is, the document did not have a change to pass or fail on its own right.) Having a separate element would make the format forward-compatible with new types of errors pertaining to the document and new types of incidental errors. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Monday, 18 December 2006 10:13:26 UTC