Ready to commit more code updates

(If there are no objections) I am about to commit to CVS some further
code updates  ahead of tomorrow's call, which:

- Add a first cut at the unit testing approach described in the doc I
sent out last week. We've now got JUnit tests and an embedded instance
of Tomcat to serve test docs to the implementation and compare results
to expected results.

- Add TagSoup tidying as a fallback if the DOM can't be parsed

- Add JHOVE for image processing

- General continued reshuffling of everything to match how I think
this has to be structured. For example there is now a "Preprocessor"
abstraction.

The code is very rough and still all in pieces on the workbench, but,
does minimally meet the requirements of the first milestone we'd
agreed on. You can run CACHING_TEST and run a unit test on it.

But there's still significant work to do. I am finding it very hard to
structure this code sensibly since every detail of processing is
related to a preprocessing object and to test results and vice versa.
Also, we have to handle every failure case and report on it
reasonably.

Next step is to get a draft of the preprocessing doc so I can
implement that. That will really drive this to some semblance of a
complete package.

Since I've heard nothing at all on this code really, I'm going to
continue to iterate on it in parallel so that we have something to
discuss, and, something to prove out the designs we have in mind.

Sean

Received on Monday, 16 April 2007 01:28:15 UTC