Re: Balancing the myth-busting.

* Gez Lemon wrote:
>Please accept my apologies, and allow me to rephrase my response:
>
>"So we agree there's a problem. The only difference is that I would
>like to see the problem addressed by the W3C validator team, and you
>would prefer it to be done through education alone", [...]

The W3C Validator Team is a very small group of mostly volunteers who
contribute a bit of their spare time to the Validator and related Pro-
jects. It should be obvious that it's difficult for us to make even
minor progress, it took more than a year to release the current ver-
sion which doesn't contain any major enhancements.

That said, the focus of the group is to change the Validator so more
people can add value to the service, making it easier to re-use it,
making it easier to extend it. This is an ambitious project and will
likely require re-writing most of the code from scratch. With the
little resources we have, it is unlikely that a new version of it
with major new features will be released in the next 12 months.

It is much less likely that the current W3C Validator Team will work
considerably more ambitious projects such as the one you propose. We
simply don't have any resources for that, and if the Markup Validator
ever supports such features, I would expect that to happen using the
extension mechanisms we are working on, with code maintained outside
the project.

To be clear, the current Validator is tightly bound to the OpenSP
SGML system which offers very limited HTML conformance checking and
yet more limited XHTML conformance checking. If you use XHTML, you
have many better tools and services at your disposal, it will take
considerable effort for the W3C Validator Team to catch up.

With respect to the Validator, the way forward is, in my opinion,
loose coupling of components through web services and (Perl) modules,
with standard interfaces to exchange user input and observations
about documents, currently code-named Acorn. You can read some more
random notes at <http://esw.w3.org/topic/MarkupValidator/M12N>.

One of the main ideas is that you can write a service that takes
some markup and generates a report about it; the "Validator" would
collect such reports and present them in a standard way. You can
and are very much welcome to do this today, regardless of what you
want the module to report.

>I strongly disagree. If you already have a document tree, what can be
>difficult about inserting nodes and attributes? Getting the document
>tree in the first place would be a much more difficult task. Inserting
>elements and attribute after the event is a trivial exercise.

If I understand correctly, you would like the Validator to report
errors in documents introduced by scripts. This is not a trivial
exercise. As already noted in this thread, if you just care about
scripts that execute on load, it's easy to code around that. Let's
consider this though, you'd basically need the following:

  * A document tree. What you can get from OpenSP is a SAX-like
    event stream, you would have to build the tree yourself. This
    is a trivial task, my SGML::Parser::OpenSP wrapper provides
    events that are sufficiently similar to Perl SAX events so
    that you can use existing code for that.

  * A ECMAScript engine. This should not be difficult, there is
    a wrapper for Spidermonkey (the Mozilla script engine) you
    could use though it might require some updating and might
    require some work on security issues.

  * You need to link the document tree to the script engine. This
    might be less trivial, I've not used the Spidermonkey wrapper
    so far, you might have to implement this from scratch but it
    might also be able to re-use existing code from Mozilla or
    other projects.

  * You need to add a variety of proprietary features for use
    with the script engine, these might again be available if
    you can re-use Mozilla code for it though that's not really
    likely. (And yes, you really need to support proprietary
    features like the 'document' object).

  * You would need to decide when to validate exactly, a script
    that executes on load and adds an illegal attribute and re-
    moves it immediately for example, should that trigger an
    error or not?

  * You need to decide when is after onload, e.g., how many times
    do you run functions that start every 10ms?

  * Depending on these decisions you will then have one or more
    points where the document is dirty and needs to be validated
    again; the code-wise most simple approach here would be to
    serialize the document tree back to a string and pass that to
    OpenSP every time you want to re-validate.

  * You might generate duplicate errors in this process, lets
    again assume the most simple approach and stop the process
    once an error is found and just report this single error

  * At some point you've completed the process and either have
    found an error or not. You could report this now. Of course,
    as noted above, the current validator architecture would make
    this difficult.

I think it's reasonable to expect that this will require more than
3000 lines of code (including comments, test suite code, etc. but of
course excluding modules you'd re-use). The Validator is about 2600
lines of code (including comments, etc. and the test suite which we
don't have...)

The results would of course not be very good, you could extend the
system without too much trouble to consider arbitrary events; that
would be of limited use too though, generating user interface events
would be difficult for example. What's beyond that would require
analyzing the script code to determine whether it's possible that the
code makes the document invalid. I think this is software research.

A much simpler approach would be to use a different environment for
such an experiment. I would recommend building it on top of the
Apache Batik SVG toolkit, you could use a wide range of XML tools
such as relatively complete DOM implementations and Validators, the
toolkit already ships with a JavaScript implementation and an imple-
mentation of the SVG DOM you'd need for any real-world scripts, and
could worry much less about some of the issues mentioned above.

And lucky you, I wrote http://esw.w3.org/topic/SvgTidy/SOMDump this
already some time ago, the Java program injects at user option a
script into the DOM tree, executes all onload scripts and prints the
resulting DOM tree to standard output. That's pretty much exactly
what you need. You might even be able to leverage Java-based browser
code to do something similar for HTML though that'll probably be
more difficult.

This does not compare to doing this using the current Validator code
though, you would have to write much of the code yourself or at least
wrappers around other code so it can be used reasonably here. That's
indeed non-trivial and it would in my opinion more worthwile if any-
one who'd want to work on this would contribute to other aspects of
the Validator first.

So, as I wrote above, as long as you design the system such that you
can generate a reasonable XML-based report of the findings, you can
start coding now and we'll be able to sort the integration into the
Validator out next year.

If writing Java-code isn't quite what you'd like to contribute,
please feel free to join #validator on irc.freenode.org or one of
the Validator/QA-dev mailing lists or write me a mail, I've been
co-developing HTML Tidy, the W3C Markup Validator, OpenSP, the
SGML::Parser::OpenSP wrapper, SVGTidy, and related tools for some
time now, I'm sure we'd find something you could work on.

Thanks,

Received on Wednesday, 10 August 2005 01:45:32 UTC