- From: Francois Daoust <fd@w3.org>
- Date: Mon, 26 Apr 2010 15:36:45 +0200
- To: public-mobileok-checker <public-mobileok-checker@w3.org>
Hi mobileOK Checker task force and other library users, Tests that apply at the markup level usually return a code extract, but not the position of the code extract in the source code, which would be extremely useful from a user perspective. The main reason that explains why the position is not there is that it was lost when the source was parsed to build an XML tree and tests are run against the XML tree, not against the source. I updated the code of the library to preserve and return the position (well, only the line number in the end) whenever possible. I committed the changes to CVS. I will need to re-validate the whole test suite but I wanted to make sure I had not missed something before I do that. Hence this email. Reactions or comments? Externally visible changes ----- - the index of the line (starting at 1) where each node appears in the source document now also appears in the HTML tree representation in the moki (within the "docContent" element). A "line" attribute in the moki namespace is added to each HTML node. The "line" attribute is in the moki namespace to prevent collisions with any (existing or not!) HTML attribute and to make it easy to remove the attribute when e.g. the node is serialized to a string to report a code extract. - A "tidied" attribute is now added to the "docContent" element in the moki representation. When set to "true", it means the mobileOK Checker had to tidy the resource under test before it could parse it. It also means that the positions may not be accurate since they represent positions in the tidied document, not in the original one. - Tests that return a code extract were updated to also return the position where the code may be found using the usual "position" element. The "tidied" attribute is set to "true" when the position comes from a tidied version of the resource under test. The following tests were updated to return the position: AUTO_REFRESH, CACHING-3 and CACHING-6, DEFAULT_INPUT_MODE, IMAGE_MAPS, IMAGES_SPECIFY_SIZE, LINK_TARGET_FORMAT-3, NO_FRAMES, NON_TEXT_ALTERNATIVES, OBJECTS_OR_SCRIPT, PROVIDE_DEFAULTS, STYLE_SHEETS_USE - Code extracts are now limited to about 50 characters in size. The Checker could sometimes return a whole section of the document as a code extract, which was not truly useful to know what was wrong. Internal changes ----- The main change is that Saxon's TinyTree's DOM implementation is used to parse the document under test with line numbering activated. The line number is then added to the moki serialization (see methods XhtmlContent.parse and XhtmlContent.toMokiNode). The use of Saxon's DOM implementation triggered a couple of bugs related to the fact that instances of DOM nodes are created on the fly by Saxon when needed, and cannot be compared with "==". They must be compared with the DOM "Node.isSameNode" method (see e.g. changes in ObjectResourceExtractor). Notes ----- - Newer versions of Saxon would also allow to preserve the column, but we cannot switch to newer versions for licensing reasons (the mobileOK Checker uses extension functions which are not included in Saxon-HE, AFAICT). - The line number seems to stay accurate when the source is tidied: the library the Checker uses to tidy up the source does not seem to add or remove lines. This shouldn't be relied upon, though. The column number would also not stay accurate. - In the moki, the introduction of the "line" attribute in HTML elements triggers the definition of a "ns0" prefix for the moki namespace defined in the HTML root, e.g.: <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xmlns:ns0="http://www.w3.org/2007/05/moki" ns0:line="2"> That's technically correct, alghough visually ugly. I would have preferred to control the serialization and generate a "moki" (or "m") prefix, but I could not figure out any easy way to do that in Java. Related "bugs" ----- 5006: Does a "tidied" element or attribute exist? 6962: Code extracts: closing tag and tag content are often useless 9538: Improve code references 9583: Return code position consistently across the tests that output code extracts These bugs are visible using: http://www.w3.org/Bugs/Public/show_bug.cgi Francois.
Received on Monday, 26 April 2010 13:37:14 UTC