TAW Checker approach

Hello everybody and sorry about the delay.

We have had a lot of work in other issues so we haven't answered before. We'll explaining the approach of TAW checker in relation with the ideas discussed in the list and share some thoughts.

We think the declarative approach could give more extensibility and portability than a language specific library. But like Sean said before, the generation of the meta document can't be made in language agnostic way so the portability is not 100% possible. Also we think that this developing (defining metadoc, share the work, put all together) would be much more expensive that the Sean Java library. Which would be the desirable date to get the reference checker?

In TAW checker we opted for a Java library (similar to the prototype offered by Sean), we'll explain in more depth in another mail. 
  
In addition we want to add some commentaries based in CTIC's experience developing TAW. Just note TAW was originally conceived as an a web accesibility (WCAG 1.0) checker tool and the solutions were taken with that aim in mind.

Our first problem was the majority of web pages are not well formed so we first thought about using a source repairer tool. But we discarded source repairer based solutions because of the traceability of the errors or warnings. If you are a user you want to know where is the problem that is which element caused it and where is inside the source code. An option is maintain a map between original source code and repaired code. We opted for instead of using DOM parsers use another parser, specifically HTML Parser library. A LGPL library for parsing (x)HTML documents that doesn't require a well formed document.

Perhaps it would be better to combine both solutions, if the document is well formed create a DOM tree otherwise enter in a 'quirks mode' and use a HTML parser.

If a repair tool is used, ¿all the checkers should use the same tool in order to generate the same results?.   
	
Next we have some dificulties with encodings. Sometimes documents are not encoded as declared and these could lead in fatal errors during parsing. So we do a preprocessing to detect the real encoding of the document before parsing it.

Regards

******************************************
Miguel García
Fundación CTIC
-Centro Tecnológico de la Información y la Comunicación-
E-mail: miguel.garcia@fundacionctic.org
Tfno: +34 984 29 12 12
Fax: +34 984 39 06 12

Parque Científico Tecnológico Gijón-Asturias-Spain www.fundacionctic.org
******************************************

Received on Sunday, 18 February 2007 14:58:18 UTC