- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Wed, 10 Aug 2005 03:45:20 +0200
- To: gez.lemon@gmail.com
- Cc: w3c-wai-gl@w3.org
* Gez Lemon wrote: >Please accept my apologies, and allow me to rephrase my response: > >"So we agree there's a problem. The only difference is that I would >like to see the problem addressed by the W3C validator team, and you >would prefer it to be done through education alone", [...] The W3C Validator Team is a very small group of mostly volunteers who contribute a bit of their spare time to the Validator and related Pro- jects. It should be obvious that it's difficult for us to make even minor progress, it took more than a year to release the current ver- sion which doesn't contain any major enhancements. That said, the focus of the group is to change the Validator so more people can add value to the service, making it easier to re-use it, making it easier to extend it. This is an ambitious project and will likely require re-writing most of the code from scratch. With the little resources we have, it is unlikely that a new version of it with major new features will be released in the next 12 months. It is much less likely that the current W3C Validator Team will work considerably more ambitious projects such as the one you propose. We simply don't have any resources for that, and if the Markup Validator ever supports such features, I would expect that to happen using the extension mechanisms we are working on, with code maintained outside the project. To be clear, the current Validator is tightly bound to the OpenSP SGML system which offers very limited HTML conformance checking and yet more limited XHTML conformance checking. If you use XHTML, you have many better tools and services at your disposal, it will take considerable effort for the W3C Validator Team to catch up. With respect to the Validator, the way forward is, in my opinion, loose coupling of components through web services and (Perl) modules, with standard interfaces to exchange user input and observations about documents, currently code-named Acorn. You can read some more random notes at <http://esw.w3.org/topic/MarkupValidator/M12N>. One of the main ideas is that you can write a service that takes some markup and generates a report about it; the "Validator" would collect such reports and present them in a standard way. You can and are very much welcome to do this today, regardless of what you want the module to report. >I strongly disagree. If you already have a document tree, what can be >difficult about inserting nodes and attributes? Getting the document >tree in the first place would be a much more difficult task. Inserting >elements and attribute after the event is a trivial exercise. If I understand correctly, you would like the Validator to report errors in documents introduced by scripts. This is not a trivial exercise. As already noted in this thread, if you just care about scripts that execute on load, it's easy to code around that. Let's consider this though, you'd basically need the following: * A document tree. What you can get from OpenSP is a SAX-like event stream, you would have to build the tree yourself. This is a trivial task, my SGML::Parser::OpenSP wrapper provides events that are sufficiently similar to Perl SAX events so that you can use existing code for that. * A ECMAScript engine. This should not be difficult, there is a wrapper for Spidermonkey (the Mozilla script engine) you could use though it might require some updating and might require some work on security issues. * You need to link the document tree to the script engine. This might be less trivial, I've not used the Spidermonkey wrapper so far, you might have to implement this from scratch but it might also be able to re-use existing code from Mozilla or other projects. * You need to add a variety of proprietary features for use with the script engine, these might again be available if you can re-use Mozilla code for it though that's not really likely. (And yes, you really need to support proprietary features like the 'document' object). * You would need to decide when to validate exactly, a script that executes on load and adds an illegal attribute and re- moves it immediately for example, should that trigger an error or not? * You need to decide when is after onload, e.g., how many times do you run functions that start every 10ms? * Depending on these decisions you will then have one or more points where the document is dirty and needs to be validated again; the code-wise most simple approach here would be to serialize the document tree back to a string and pass that to OpenSP every time you want to re-validate. * You might generate duplicate errors in this process, lets again assume the most simple approach and stop the process once an error is found and just report this single error * At some point you've completed the process and either have found an error or not. You could report this now. Of course, as noted above, the current validator architecture would make this difficult. I think it's reasonable to expect that this will require more than 3000 lines of code (including comments, test suite code, etc. but of course excluding modules you'd re-use). The Validator is about 2600 lines of code (including comments, etc. and the test suite which we don't have...) The results would of course not be very good, you could extend the system without too much trouble to consider arbitrary events; that would be of limited use too though, generating user interface events would be difficult for example. What's beyond that would require analyzing the script code to determine whether it's possible that the code makes the document invalid. I think this is software research. A much simpler approach would be to use a different environment for such an experiment. I would recommend building it on top of the Apache Batik SVG toolkit, you could use a wide range of XML tools such as relatively complete DOM implementations and Validators, the toolkit already ships with a JavaScript implementation and an imple- mentation of the SVG DOM you'd need for any real-world scripts, and could worry much less about some of the issues mentioned above. And lucky you, I wrote http://esw.w3.org/topic/SvgTidy/SOMDump this already some time ago, the Java program injects at user option a script into the DOM tree, executes all onload scripts and prints the resulting DOM tree to standard output. That's pretty much exactly what you need. You might even be able to leverage Java-based browser code to do something similar for HTML though that'll probably be more difficult. This does not compare to doing this using the current Validator code though, you would have to write much of the code yourself or at least wrappers around other code so it can be used reasonably here. That's indeed non-trivial and it would in my opinion more worthwile if any- one who'd want to work on this would contribute to other aspects of the Validator first. So, as I wrote above, as long as you design the system such that you can generate a reasonable XML-based report of the findings, you can start coding now and we'll be able to sort the integration into the Validator out next year. If writing Java-code isn't quite what you'd like to contribute, please feel free to join #validator on irc.freenode.org or one of the Validator/QA-dev mailing lists or write me a mail, I've been co-developing HTML Tidy, the W3C Markup Validator, OpenSP, the SGML::Parser::OpenSP wrapper, SVGTidy, and related tools for some time now, I'm sure we'd find something you could work on. Thanks,
Received on Wednesday, 10 August 2005 01:45:32 UTC