- From: Michael(tm) Smith <mike@w3.org>
- Date: Fri, 19 Mar 2010 19:23:12 +0900
- To: Dan Connolly <connolly@w3.org>
- Cc: noah_mendelsohn@us.ibm.com, Paul Cotton <paul.cotton@microsoft.com>, Philippe Le Hegaret <plh@w3.org>, Sam Ruby <rubys@intertwingly.net>, "www-tag@w3.org WG" <www-tag@w3.org>, Maciej Stachowiak <mjs@apple.com>
Dan Connolly <connolly@w3.org>, 2010-03-18 09:35 -0500: > On Thu, 2010-03-18 at 19:11 +0900, Michael(tm) Smith wrote: > > And that grammar on its own doesn't even completely express all > > the constraint checks that need to be done by an actual validation > > tool that uses it as an input [...] > I'm not asking for a formalism > adequate to build a validator. Just a (machine-readable) sketch > of what-goes-where. > > Even though a schema is an incomplete description of a language, > it's often useful; e.g. it can show which elements are allowed > where so that authoring tools can use it to build > context-sensitive auto-complete lists. Yes, agreed. But there is nothing today that prevents anybody from copying or checking out the validator.nu schema source files and integrating them into other applications however they see fit. It is not necessary for the HTML WG to (re)publish the validator.nu schema files in W3C TR space in order to facilitate those kinds of reuses of that schema. I think what it might be more appropriate for the HTML WG to do is to publish an Group Note, titled "Editing and Validating HTML5" or something, that would provide information about existing tools that have HTML5-specific editing features and validating features, as well as tutorial information on how to integrate HTML5-specific editing features and validation features into existing applications. That document could provide readers with URLs for online and offline HTML5 validation services, as well as information about available schemas, and URLs for the locations where they are maintained and from where they can downloaded; e.g., it could have a section saying, "The following HTML5-related schema implementations are available", and then list the location of the validator.nu HTML5 schema along with any other schema implementations that might become available. It would be the responsibility of the editor and the WG to make a qualitative judgment about which resources listed in that document (with part of the criteria of course being whether they actually conform to to the document-conformance criteria in the HTML5 spec). I would be willing to volunteer to be the editor for such a document, if others think it would be worth having. A possible objection that I can anticipate we might have to the existence of a document like the one I just described is that I know some people will likely say it's not a good idea to have multiple schema implementations available -- that there should just be one good one that is commonly used by everybody. I would have to say that I'd disagree with that. I think we instead want to encourage people to develop alternative schema implementations, and have those be evaluated equally on the basis of their relative merits. I guess it could well be that might never happen, and the validator.nu schema might eventually become the de facto common schema that other applications use. But I don't think we should risk prematurely ensuring that outcome without giving time and opportunity for others to develop alternatives. > > IMHO, that it would be > > potentially very misleading to publish it and encourage its use > > outside the context of its function as part of a complete HTML > > validation tool (like validator.nu). > > Misleading in what way? Do any examples come to mind? Misleading in that if it is published in W3C TR space, even if it is so published by the WG with an understanding that it's just one of many possible ways of expressing a particular subset of the document conformance criteria in the language, and with agreement that it's intended for a particular limited class of use cases -- for example, for use in editing applications that provide context-sensitive auto-complete lists, but not for use on its own in building a validation service -- it nevertheless risks ending up being seen by the community at *the* validation schema endorsed by the HTML WG and the W3C, in the same way the HTML4 and XHTML1 DTDs that the W3C published are effectively the only normative validation schemas for those versions of the language, and the W3C Markup Validation Service built around those is seen as *the* single, only officially endorsed validation service for HTML. I think we should instead do what we can to encourage multiple implementors to compete to provide the best validation services. > > [I mentioned the validator.nu schema being optimized for use > > within the validator.nu service] > > Maybe I can find them myself, but if you could be more specific > about those cases, I'd appreciate it. I'm interested to look at > how much these optimizations make the schema less useful. One specific case is the handling of required attributes. An example is the link element, for which the HTML5 spec defines href and rel as required attributes. So I make an RNC schema that defines the content model and attribute model for the link element like this: link.elem = element link { link.inner & link.attrs } link.attrs = ( common.attrs & shared-hyperlink.attrs.href & shared-hyperlink.attrs.rel & shared-hyperlink.attrs.hreflang? & shared-hyperlink.attrs.media? & shared-hyperlink.attrs.type? & link.attrs.sizes? # link.attrs.title included in common.attrs ) link.inner = ( empty ) If I then take that schema along with a document that contains an instance of a link element that lacks an href attribute and I feed it to Jing, then the complete text of the error message that Jing emits is this: Required attributes missing on element link Note that it doesn't tell you which attributes are actually missing. So to make up for that deficiency in Jing, what the validator.nu backend currently actually uses is a schema that defines the href and rel attributes on the link element as being optional (not required). What that does is, it causes the above not-particularly-useful error message to be suppressed. So what validator.nu is currently using instead to check for the href-and-rel-required constraint is Java code in this file: https://whattf.svn.cvsdude.com/syntax/trunk/non-schema/java/src/org/whattf/checker/schematronequiv/Assertions.java (Search for "link required attrs" in that file.) That causes the following much-more-useful error message to be emitted instead: A link element must have an href attribute However, the RNC schema files that are in the validator.nu source repository actually still define rel and href as required on link. The change to make them optional (to suppress the Jing error) is done through a hacky part of the validator.nu build. The reason is was done that way was in the interest of helping to make the schema more "portable" (for lack of a better word) outside of the context of its specific use within Jing. Somebody less concerned about trying to keep it portable might instead just optimize it by forgoing the build hackery and just making the change directly in the source repository and being done with it. (Anyway, fwiw, the real solution to the particular issue that motivated all that is to fix Jing. George Bina made a Jing patch that does that -- that causes Jing itself to emit useful error messages for required-but-missing attributes and other cases. I have done a rough initial integration of that into a development instance of the validator.nu backend, but I still need to make some further changes to actually get it done properly. In the means time, I posted the following note to myself about it - http://lists.w3.org/Archives/Public/www-archive/2010Jan/0077.html > > The information in that schema was developed from the almost > > completely prose-based document-conformances constraint expressions > > in the HTML5 spec itself. So it can reasonably be viewed as just > > one possible implementation of just some particular document- > > conformance constraints that are expressed in the HTML5 spec. > > Very well, but it's likely to be good enough for lots of other > uses, and I'd like to save others the work of re-doing > the development of a schema from the HTML 5 spec prose. > [...] > Sure, things may change in the future. But until you do > sever the tie, it seems pretty useful to include the schema > in an appendix, as it saves others the work of > "putting the toothpaste back in the tube", i.e. reversing > the prose-generation process. The availability of that schema in the validator.nu source repository already does that. Anybody else can grab it from there just as I did. If we want to publicize its availability further, I think an HTML WG Note like the "Editing and Validating HTML5" one I proposed above would be a very good way to do that. > > On top of that, if the HTML WG were to publish a formalism or set > > of formalisms for the HTML language, I think there is a good > > argument against making that particular schema the basis for it > > (for the reasons I mentioned above -- that it's already optimized > > to some degree for particular use with validator.nu, and may in > > the future end up being optimized for that even more so). > > I'd like to see the specifics of that argument; I'm not persuaded > by the general claims that the context in which it was developed > makes it worse than nothing. I hope some of what I wrote above helps to clarify that. I hope it's also clear by now that I also don't think it's necessary for the HTML WG to publish or endorse any single specific formalism or set of formalisms for HTML5 to the exclusion of any others. I think we should instead just provide the HTML5 spec itself as the standard expression of the document-conformance constraints of the language, and encourage multiple implementors to develop of schemas and other formalisms for specific use cases (e.g., for use in providing context-sensitive auto-complete lists in editing apps). --Mike -- Michael(tm) Smith http://people.w3.org/mike
Received on Friday, 19 March 2010 10:23:22 UTC