RE: Requirements for mobileOK reference checker

Hi 

Further comments in line

Jo

> -----Original Message-----
> From: Sean Owen [mailto:srowen@google.com]
> Sent: 08 March 2007 22:14
> To: Jo Rabin
> Cc: public-mobileok-checker@w3.org
> Subject: Re: Requirements for mobileOK reference checker
> 
> All sounds good to me. A few comments in line.
> 
> On 3/7/07, Jo Rabin <jrabin@mtld.mobi> wrote:
> 
> > [4.2] Input to the checker will be specified by URI [should we
consider
> > a literal string as well? given that the checker needs to check most
> > external references, would this in fact be useful? Yes, if the tests
> > relating to external references are skipped, or if the base uri can
be
> > supplied]
> 
> I think it would be nice to accept a string (well, really byte
> sequence). Without HTTP headers, some tests will definitely fail
> though. So one would have to accept HTTP headers. So I start to
> question how useful it is.

I think the key point is that it would be nice to offer a 'tools'
interface. i.e. try to encourage people to submit stuff _before_
committing to their server. I know there are other ways of doing this
and I know that this will be far from complete but nonetheless think it
could be useful.

> 
> > [4.3] The checker will be written in Java and provide a programmatic
> > interface with bindings initially to Java [and SOAP?]. [Are we going
to
> > write something other than Javadoc by way of documentation/design?]
> 
> Javadoc is good.
> I had had in mind that this would be a Java implementation which could
> be embedded in, say, Tomcat/Axis to expose them as a SOAP service, but
> that that is a separate project.

I wonder how much of an obstacle to porting the 'the code is the
documentation' approach presents?

> 
> > [4.4] The checker development project will not develop a user
interface
> > except as necessary for testing it, but the use case of its
deployment
> > in a human request / response environment should be borne in mind.
> > Specifically this should not be seen as a project to create the W3C
> > mobileOK checker.
> 
> I agree, but with the understanding that the very next project should
> be to make it the new backend of validator.w3.org/mobile
> 


Well, that is up to Dom / W3C isn't it? 

> > [4.5] The checker will create an intermediate document that makes
> > available for inspection all details of retrievals, validation and
other
> > pre-processing required in order to carry out the tests. The format
of
> > this intermediate document will be specified separately, and will
use
> > existing representations [like RDF/HTTP] where possible.
> > [and per resolution of 26 Feb from an API perspective this needs
also to
> > be available as DOM or SAX-wise or as a Java class?]
> 
> I think the results should be available as a DOM (and thus as a
> document), and also in a native Java class representation.

We need to decide what the primary representation is, I think. I'd
prefer this to be documented as a Schema and have a mapping from the
schema to Java native class rather than vice versa.

> 
> > [4.5.2] To allow processing of mal-formed and invalid primary input
> > documents (those that are the subject of the test, rather than
resources
> > that are referenced) the pre-processing will provide a 'cleaned up'
> > version [whose xml header and Doctype declaration at least, will
need to
> > have some magic performed to allow inclusion in the middle of the
> > document] and that the nature of the clean-up needs to be explicit
and
> > not implementation dependent [i.e. using Tidy is all very well, but
it
> > is opaque in its operation; from this pov, perhaps we should look
more
> > closely at Dom's suggestion of
http://home.ccil.org/~cowan/XML/tagsoup/
> > which (I think) operates on the basis of explicit rules which can be
> > captured and repeated]
> 
> This is a tough call to me... if you tidy a doc then you are testing
> something different than what you really got. You wouldn't want to
> pass the cleaned-up doc when the raw one would fail.
> 
> The idea is, I imagine, to fail the raw document but additionally say,
> oh, if you cleaned it up a bit here's some more results you could get.
> Nice idea.

Yes, it should definitely FAIL on hard FAILs. 

Yes, this is really to answer the point in mobileOK about giving maximum
possible info to developer. i.e. to try to prevent them fixing problem
1, rechecking, fixing problem 2, rechecking and so on. It will be
imperfect whatever we do, of course. 

> 
> I wonder if it makes sense to consider this external -- you're free to
> tidy your doc before passing it through if you want. Or consider it an
> option -- run the test is lenient mode? I am torn on whether the
> complexity and confusion is worth it for documents that already can't
> even get their markup right.

Per above

> 
> > [4.5.5] HTTP parameters and their values should be recorded in a
> > normalised form as well as being recorded in their original form.
> 
> Headers? or why do we care about URI parameters?
> if headers what is the normalized form like? I think we should report
> and test on the real header value.

Sorry, I did not mean parameters. I meant headers, as you correctly
inferred. My point is that it makes post-processing easier if you always
report Content-Type as that and not as content-type if that is what the
server actually returned.

Equally, if we are to use HTTP-in-RDF then we'd want to know what had
been transformed in order to arrive at the processed RDF representation.

> 
> > [4.7] It must be possible to add tests without recompiling the
checker.
> 
> Yes, well I thought the idea is that the implementation should
> externalize enough information that external entities can reuse that
> information to write more tests. I don't imagine one would extend the
> implementation by actually modifying it.
> 
Well that would be one way of meeting the requirement :-)

> > [4.8] It must be possible to replace sub-components (such as remote
> > validation steps) by configuration option.
> 
> What does this mean, just that there needs to be some configurable
> behavior? I agree though want to be careful that a PASS means
> something clear -- not "PASS, but if you set this option" but
> deifnitely "PASS"

PASS is always conditional on the processing you have done. If you use
'ropey-old-validator-that-barfs-on-the-wrong-stuff' then you have a
different meaning of PASS than if you use
'industry-standard-and-most-up-to-date-validator'. So I think this is
why the validation steps need to be named, reported on and open to
configuration.

Jo

Received on Monday, 12 March 2007 11:06:14 UTC