Re: UniCORN book of specifications


On 5/17/06, olivier Thereaux <> wrote:
> Hi,
> On May 15, 2006, at 17:20, Jean-Guilhem Rouel wrote:
> >  Damien and I have written a document available at
> > about the specifications of
> > the micro-observer framework.
> > It contains a description of the framework requirements, but also use
> > cases and questions more or less technical about specific points.
> Here are a few notes from a first pass at the document.
> [[
> The aim of our internship is to create a "universal validator" that
> will be able to validate and check multiple things in a document
> through a single Web interface.
> ]]
> As you said in your "questions", the term validator here is not
> appropriate. I think what you are building is an observation framework
> for Web documents. Universal is, well, nice in the acronym, but perhaps
> a bit too much here ;) And validation will only be a tiny part of the
> tasks the framework will allow its modules to perform.
> also s/check multiple things/perform multiple observations and checks/
> perhaps?

Ok for that, we will correct it.

> looking at I see that
> you are proposing to reuse a template syntax similar to that of the CSS
> validator. It is not a template syntax I am very familiar with, is it a
> "standard" templating syntax for java, or something that was invented
> for the CSS validator.

In fact, this is not a template syntax for Java. The CSS validator
uses the Properties
class which allows to load a list of keyword:value in an Hashtable, so
that it is very simple
to use them. The CSS validator also have some specific keywords such
as <!-- #result --> that you can put in properties and that will be
replaced by the "output engine" with the values computed. So, this is
a mix between something made by Sun, and something defined by the CSS

> If the latter, it may be a good idea (or a waste
> of time, yours to decide) to look at other template syntaxes.

We will look at other template syntaxes. It seems that several template engines
already exist in Java : .
An underlying question is can we use as many third-party APIs/tools as
we want, or should we restrict it?

> Regardless of the syntax choices, which are not really important anyway
> as long as it is documented somewhere, I think that there will be some
> work in defining how context is/will be passed, how looping will be
> done, etc.
> For instance, I understand that something like <!-- #error_line -->
> will be replaced by the value of the current error's line. But
> "current" has to be well defined, especially if we loop over a nomber
> of errors. Which leads me to wondering why <!-- #errors --> will not be
> replaced by a value, but by a loop.

In the very simple template engine I presented, <!-- #errors --> would
be replaced by a loop
only because we decided it in the framework. But I agree, this is
probably not the best solution.

> In that sense, the templating syntax of e.g HTML::Template, while
> clunky too in some aspects, have the advantage of dissociating looping,
> logic (if, then, else) and variable substitution. Poke around
> for examples.

We will also have a look at this.

> On output formats, other potential candidates: send mail, or simply
> plain text (for command line usage).

In fact, plain text is an output that should be built using the template engine.
To add other output formats like mail, it will probably be more
difficult since it is not only an output, but also a transport method.
In the "specs" we wrote "The Web interface will be the only "official"
one, but it will be easy to add new UIs, extending proper classes and
interfaces.". This could be a way to add mail or command-line support,
but I think we will have to separate it from the template part,
because it uses really different technologies (command-line, servlet,
mail, ...)

> I think the way you envision the
> output, with the central module gathering and passing observers' output
> through templates, is the way to go. I'm not sure if I showed you or
> told you about the "log validator", which has a lot of similarities
> with what you are working on: a modular architecture, several
> observers, and several possible output methods. The big difference is
> that the logvalidator works on a list of documents (usually, a log file
> from the web server) and not on one, so the focus is different. The
> reason why I think of the logvalidator now is that after a few
> releases, I found myself limited in what the tool outputs, because the
> output from each "observation" module was not rich and structured
> enough. For your framework, making sure that the output for each
> observation is as structured as possible (i.e avoiding loose text) will
> be a key thing.

If the modules' output is correctly structured, the final result will
be as well.
I think this is mainly a problem in the observers design: if they do
not follow the rules imposed by the framework, or follow them in an
erroneous way, the result will be quite strange and inconsistent.

> I was at first puzzled by the usage of "actors" word in the use cases.
> That's because for me, the actors are really the various observers, the
> central module, etc. To make it less confusing, I would perhaps call
> them "users" - framework maintainers being super users, computers being
> automated users, etc.

We use this word because it's the UML word to describe someone or
something outside the system that can interact with it. In our use
cases, we only speak about users, but an actor could have been a
third-party software doing automated validation for example.
But if you want, we can change it to make it less confusiong.

> One note I thought of when reading the last use case about
> "incompatible check": should the framework have a silent recovery
> mechanism when asked to send a document through an irrelevant
> component/observer. I suspect that there will be "default" observers in
> the interface (markup, css), it would be nice if one could send a css
> file without getting an error message about the file not having markup
> to check.

I think we should display at least a warning.

> This is related to one of the first questions, "who will parse the
> document". I think one answer to that is that the central module always
> will know the mime type of the document being checked, and based on
> that, it will be able to either:
> - dispatch to observers it knows will be relevant (e.g type is SVG ->
> send to XML wellform checker, SVG conformance checker, CSS too perhaps,
> etc)
> - try to pre-parse documents for which it has a basic knowledge, and
> the dispatch (e.g HTML)
> - consider the type to be outside of its limits (e.g some gif image -
> no relevant observer exists)

So, you think that the best solution would be to have an automatic
detection of the observers to use, rather than a user choice? Indeed,
it's probably the best solution to avoid human errors, but what to do
if the user does not want to active the xhtml observer when supplying
an xhtml document? That's the reason why we chose to let the user
activate the observers he wants.

> now quickly going through other questions...
> [[ How observers should interact with each others and with UniCORN? ]]
> I think a centralized solution would be much simpler. It doesn't mean
> that there cannot be some complex sequence. For example, we could think
> that observer B will only be called if observer A has returned a
> certain type of observation, on either the whole document or a specific
> fragment. But making observers independently communicate would
> certainly make things too complicated.

We also think a centralized solution would be simpler. Your cascading
observer calls is interesting, but I wonder if this will be useful...
maybe you can give an example of such a case?

> Yves may have some insight on this, with Web Services in mind.
> [[ Which implementation for the framework? ]]
> fortran.


> [[ Use of WSDL/WADL ]]
> My (limited) knowledge of WSDL tells me that it should work to define
> the service contract between the central module and observers. Do talk
> to Yves about it, he will certainly know documents to point you to in
> order to understand WSDL better for your needs.

In fact, Yves told us about WADL (
), we will talk about it today.

> All for now... Does that provide you with material to make progress?

Yep :)

> cheers,
> --
> olivier

Received on Wednesday, 17 May 2006 09:09:37 UTC