- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Wed, 31 Oct 2007 18:29:42 +0200
- To: olivier Thereaux <ot@w3.org>
- Cc: W3C Validator Community <www-validator@w3.org>, "Chris. Parrish" <chris.forummail@swankinnovations.com>, Brett Bieber <brett.bieber@gmail.com>, Struan Donald <struandonald@gmail.com>
Hi, On Oct 30, 2007, at 21:19, olivier Thereaux wrote: > On Oct 29, 2007, at 12:37 , Henri Sivonen wrote: >> Does the sequential output require a rewrite of client code? If it >> does anyway, it might make sense to drop the SOAPness and make it >> plain old XML. Or are clients actually benefiting from the SOAP >> envelope in terms of tool support in a way that would break with POX? > > As far as I can tell most implementations just parse the XML of the > SOAP output. I think one of them does build upon a SOAP library and > thus expects the format to be in a SOAP envelope. > > One option I am pondering about is to leave the SOAP output as it > is (that is, with its oddly grouped messages) and revive an XML > output. Makes sense. > I looked at: > http://wiki.whatwg.org/wiki/Validator.nu_XML_Output > and it does look usable. The more I look at it, the more I think > the W3C validator could adopt this as XML output Cool. > (we used to have one but never really documented and since then > deprecated, we could revive it) provided we can make a few > (backward compatible) changes. > > * adding a warning element to info and error - would be nicer IMHO > than having warnings a type of info Making warnings a type of info was a careful forward-compatibility design decision. There are three main classes of messages that clients need to know about in order to compute the validation outcome from message classes in a forward-compatible way. These main classes map to elements. The repertoire of message elements is not extensible without breaking the outcome computation semantics. The type attribute values are extensible in a forward-compatible way without breaking outcome computation in clients that do not know about a particular type attribute value. All kinds of messages that do not imply invalidity and do not imply a non-document failure have the same element (<info>). Since warnings are a special case of this general class of messages, the warningness is in the type attribute, since the distinction between warning and other informative messages does not participate in the outcome computation. > * checkedby I think this is information that the client should already know, but adding a URI that points to the checker would be harmless except for the response size increase. (It should probably be called checked-by for consistency with the other hyphenated names.) Instinctively, I'd make checked-by an optional attribute on the root element (taking a URI as the value). This assumes that the producer of the result format is writing out its own identity that it always knows in advance. If this format were to be used by Unicorn as an output format, would it be necessary to mark checked-by on a per-message basis? If yes, then message elements should have a checked-by attribute as well and in the absence of the attribute, the checked-by attribute on the root element would be taken as the indication of source of the message. > validity, An earlier draft of the format had an explicit tri-state (success/ failure/indeterminate) outcome indicator element, but I removed it before I started implementing, because the format is otherwise designed to support forward-compatible computation of the outcome from the message data. Therefore, a validity indicator would always be either redundant or in disagreement with the messages due to a bug. For the latter case, the processing model would have to define what clients are to do if they get inconsistent data, which would complicate the spec. > doctype, I forgot this when I said earlier that there were only two things that the W3C Validator HTML output had but the Validator.nu XML format couldn't capture in its current form. How about adding an optional element <doctype> that has two optional attributes: public and system? The content of the element could optionally contain a human-readable characterization of the doctype (e.g. "HTML 4.01 Strict"). The <doctype> element would be allowed as a child of the <messages> element (in any position relative to its siblings; that is, the validator could emit it as early or late as implementation-wise practical). > charset, Validator.nu currently reports the HTTP-level charset when source code in included in the response, since the HTTP-level charset is considered a metadatum of the source code. However, the actual character encoding used for decoding the document is not reported anywhere when there's no HTTP-level declaration and the encoding is determined from the content. An encoding name would naturally be the kind of data that goes in an attribute if you consider how the format otherwise puts things that aren't human-readable messages or source code in attributes. This raises the question of what element should host the attribute. Suppose there was an element called <metadata> for hanging various metadata attributes onto. An encoding attribute (to avoid "charset" per charmod) could go onto that element. The element could also have an attribute stating the root namespace. But then that raises a question why doctype would be an element on its own instead of its attributes being part of this new element. The easy way out is to ask: Does the charset really need to be stated? :-) > errorcount, warningcount etc These are redundant data, but if they are added, they should probably be error-count and warning-count for consistency. When redundant data like error-count or warning-count is optional (I agree they should not be required), it isn't particularly useful. A consumer cannot trust optional data to be there. Therefore, a robust consumer that needs the error or warning count needs to be able to count the errors or warnings on its own. Once a consumer is able to count them anyway, it doesn't need the counts to be explicitly stated. > let's make them optional, but I think they are useful. I agree that the features you suggested are best left optional. > They aren't a problem for a streaming response, if sent at the very > end, anyway. Agreed. This could be relaxed a bit by saying that this new stuff can occur as late as the generator chooses. > * some kind of identifier for the errors. I realize this may bring > some headaches if the format is shared by various tools, but for > localization and/or customization, it'd be extremely useful. I guess every message element (<info>, <error>, <non-document-error>) could be given an attribute called e.g. message-id that gives an implementation-specific message identifier. (It should not be called id, since implying IDness would be bad as there can be multiple instances of a given message.) It could further be stipulated that since message-id is implementation-specific, checked-by SHOULD be used (on the root or on the message) when message-id is used. If a client does not recognize the checked-by value and, thus, is unable to use implementation-specific semantics, it could still compare the message-id values for strict string equality to discover which messages are instances of the "same" message. Hence, equivalence classes could be established without knowing the semantics of the equivalence classes. Aside: I realize that the W3C Validator wants to communicate message ids, and I'm not trying to fight that. However, I'm probably not going to emit message ids from Validator.nu in the foreseeable future. Validator.nu emits errors from many different places including an HTTP client wrapper, parsers, RELAX NG validator(s), Schematron validator(s) and custom Java code. There is no error identification scheme even inside Validator.nu itself let alone between different online validators. Moreover, I have doubts about the usefulness of message ids for localization: you don't get parameters that went into a message formatter in their unformatted form, so you might as well run pattern matching against the error message itself directly. > The output format you created is sequential, which is a good basis > for what we need. We'd also need a way to group errors by type, but > that can be an alternative format with a similar base. The main > issue is that your locator elements give their location as > attributes, which makes it hard to represent that a tool found > several instances of a given message. > > What do you think? I think different grouping options are a UI feature. A software-to- software Web service API format should merely communicate sufficient data for the consumer to be able to group messages for its UI. The data format does need to change its data ordering when a consumer wants to show a grouped UI. Assuming a message-id attribute, consumers could group by that if they so choose. In the absence of the attribute, consumers could group by comparing the text content of the <message> element. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Wednesday, 31 October 2007 16:30:24 UTC