RE: TAW Checker approach from Jo Rabin on 2007-02-23 (public-mobileok-checker@w3.org from February 2007)

From: Jo Rabin <jrabin@mtld.mobi>
Date: Fri, 23 Feb 2007 08:48:05 -0500
To: <public-mobileok-checker@w3.org>
Message-ID: <815E07C915F39742A29E5587B3A7FA192940F575@lk0-cs0.int.link2exchange.com>
Some thoughts in-line below:

> -----Original Message-----
> From: public-mobileok-checker-request@w3.org [mailto:public-mobileok-
> checker-request@w3.org] On Behalf Of Miguel Garcia
> Sent: 16 February 2007 12:06
> To: public-mobileok-checker@w3.org
> Subject: TAW Checker approach
> 
> 


> 
> We think the declarative approach could give more extensibility and
> portability than a language specific library. But like Sean said before,
> the generation of the meta document can't be made in language agnostic way
> so the portability is not 100% possible. 

I'm not sure I understand why it would be any more or less portable than the other approach. Perhaps I misunderstand your point.

> Also we think that this
> developing (defining metadoc, share the work, put all together) would be
> much more expensive that the Sean Java library. Which would be the
> desirable date to get the reference checker?

I agree that it would be better to have this sooner rather than later, but don't think that we should chuck away any long term possibly benefits as a result of pursuing just that goal.

> 
> In TAW checker we opted for a Java library (similar to the prototype
> offered by Sean), we'll explain in more depth in another mail.
> 
> In addition we want to add some commentaries based in CTIC's experience
> developing TAW. Just note TAW was originally conceived as an a web
> accesibility (WCAG 1.0) checker tool and the solutions were taken with
> that aim in mind.
> 
> Our first problem was the majority of web pages are not well formed so we
> first thought about using a source repairer tool. But we discarded source
> repairer based solutions because of the traceability of the errors or
> warnings. If you are a user you want to know where is the problem that is
> which element caused it and where is inside the source code. An option is
> maintain a map between original source code and repaired code. We opted
> for instead of using DOM parsers use another parser, specifically HTML
> Parser library. A LGPL library for parsing (x)HTML documents that doesn't
> require a well formed document.
> 
> Perhaps it would be better to combine both solutions, if the document is
> well formed create a DOM tree otherwise enter in a 'quirks mode' and use a
> HTML parser.
> 
> If a repair tool is used, ¿all the checkers should use the same tool in
> order to generate the same results?.

I agree that this is a substantial issue. My working assumption, to date, has been that the tool should be based as far as possible on existing 'authoritative' resources. And hence would ideally be some kind of mash-up of the W3 Validators, Tidy and so on. I have had some issues with using CGI Tidy, though, so that needs more thought.

I suppose that at the heart of this is the question of how definitive this should be. The mobileOK spec says that you should provide as much information as possible. But if the information you are providing is based on a broken document, then the definitive information is possibly only that the document is broken. The rest of the information is informative and speculative - so I wonder if this is not, in fact, a legitimate area for differences between implementations?

> 
> Next we have some dificulties with encodings. Sometimes documents are not
> encoded as declared and these could lead in fatal errors during parsing.
> So we do a preprocessing to detect the real encoding of the document
> before parsing it.
> 

Again, is this an area that should have a definitive approach? Or is it one that we say is open to differences between implementations? The reference checker could simply take the view that it's got to be UTF-8 and that it's not going to do anything special to work around the fact that you have specified something else and/or got the encoding wrong.

Jo
Received on Friday, 23 February 2007 14:10:15 UTC