Re: Requirements for mobileOK reference checker from Sean Owen on 2007-03-08 (public-mobileok-checker@w3.org from March 2007)

From: Sean Owen <srowen@google.com>
Date: Fri, 9 Mar 2007 07:14:28 +0900
To: "Jo Rabin" <jrabin@mtld.mobi>
Cc: public-mobileok-checker@w3.org
Message-ID: <e920a71c0703081414x1c218036s371946cf9fcd64b3@mail.gmail.com>
All sounds good to me. A few comments in line.

On 3/7/07, Jo Rabin <jrabin@mtld.mobi> wrote:

> [4.2] Input to the checker will be specified by URI [should we consider
> a literal string as well? given that the checker needs to check most
> external references, would this in fact be useful? Yes, if the tests
> relating to external references are skipped, or if the base uri can be
> supplied]

I think it would be nice to accept a string (well, really byte
sequence). Without HTTP headers, some tests will definitely fail
though. So one would have to accept HTTP headers. So I start to
question how useful it is.

> [4.3] The checker will be written in Java and provide a programmatic
> interface with bindings initially to Java [and SOAP?]. [Are we going to
> write something other than Javadoc by way of documentation/design?]

Javadoc is good.
I had had in mind that this would be a Java implementation which could
be embedded in, say, Tomcat/Axis to expose them as a SOAP service, but
that that is a separate project.

> [4.4] The checker development project will not develop a user interface
> except as necessary for testing it, but the use case of its deployment
> in a human request / response environment should be borne in mind.
> Specifically this should not be seen as a project to create the W3C
> mobileOK checker.

I agree, but with the understanding that the very next project should
be to make it the new backend of validator.w3.org/mobile

> [4.5] The checker will create an intermediate document that makes
> available for inspection all details of retrievals, validation and other
> pre-processing required in order to carry out the tests. The format of
> this intermediate document will be specified separately, and will use
> existing representations [like RDF/HTTP] where possible.
> [and per resolution of 26 Feb from an API perspective this needs also to
> be available as DOM or SAX-wise or as a Java class?]

I think the results should be available as a DOM (and thus as a
document), and also in a native Java class representation.

> [4.5.2] To allow processing of mal-formed and invalid primary input
> documents (those that are the subject of the test, rather than resources
> that are referenced) the pre-processing will provide a 'cleaned up'
> version [whose xml header and Doctype declaration at least, will need to
> have some magic performed to allow inclusion in the middle of the
> document] and that the nature of the clean-up needs to be explicit and
> not implementation dependent [i.e. using Tidy is all very well, but it
> is opaque in its operation; from this pov, perhaps we should look more
> closely at Dom's suggestion of http://home.ccil.org/~cowan/XML/tagsoup/
> which (I think) operates on the basis of explicit rules which can be
> captured and repeated]

This is a tough call to me... if you tidy a doc then you are testing
something different than what you really got. You wouldn't want to
pass the cleaned-up doc when the raw one would fail.

The idea is, I imagine, to fail the raw document but additionally say,
oh, if you cleaned it up a bit here's some more results you could get.
Nice idea.

I wonder if it makes sense to consider this external -- you're free to
tidy your doc before passing it through if you want. Or consider it an
option -- run the test is lenient mode? I am torn on whether the
complexity and confusion is worth it for documents that already can't
even get their markup right.

> [4.5.5] HTTP parameters and their values should be recorded in a
> normalised form as well as being recorded in their original form.

Headers? or why do we care about URI parameters?
if headers what is the normalized form like? I think we should report
and test on the real header value.

> [4.7] It must be possible to add tests without recompiling the checker.

Yes, well I thought the idea is that the implementation should
externalize enough information that external entities can reuse that
information to write more tests. I don't imagine one would extend the
implementation by actually modifying it.

> [4.8] It must be possible to replace sub-components (such as remote
> validation steps) by configuration option.

What does this mean, just that there needs to be some configurable
behavior? I agree though want to be careful that a PASS means
something clear -- not "PASS, but if you set this option" but
deifnitely "PASS"
Received on Thursday, 8 March 2007 22:14:43 UTC