Re: [Library update] Position now returned by all tests that apply at the markup level

Replying to myself...

Francois Daoust wrote:
> Internal changes
> -----
> The main change is that Saxon's TinyTree's DOM implementation is used to 
> parse the document under test with line numbering activated. The line 
> number is then added to the moki serialization (see methods 
> XhtmlContent.parse and XhtmlContent.toMokiNode).
> 
> The use of Saxon's DOM implementation triggered a couple of bugs related 
> to the fact that instances of DOM nodes are created on the fly by Saxon 
> when needed, and cannot be compared with "==". They must be compared 
> with the DOM "Node.isSameNode" method (see e.g. changes in 
> ObjectResourceExtractor).

I found and fixed a few other bugs that had been created by that change, 
for instance related to counting extraneous characters.
I also found and reported a bug in Saxon that affects the use of an 
entity resolver and thus the possibility to use a local catalog of DTDs 
when a Document is created in a specific way (and more precisely right 
the way we need...). Michael Kay suggested some workaround which I 
implemented today:
https://sourceforge.net/mailarchive/message.php?msg_name=209D7731E68043DC8F6695AF79CD6397@Sealion



> Notes
> -----
> - Newer versions of Saxon would also allow to preserve the column, but 
> we cannot switch to newer versions for licensing reasons (the mobileOK 
> Checker uses extension functions which are not included in Saxon-HE, 
> AFAICT).

Actually, we can switch to Saxon-B version 9.1 that adds the 
functionality with the same license. Any reason not to?



> - The line number seems to stay accurate when the source is tidied: the 
> library the Checker uses to tidy up the source does not seem to add or 
> remove lines. This shouldn't be relied upon, though. The column number 
> would also not stay accurate.

The library did add/remove lines from time to time in practice, but I 
should now have fixed most of the cases where this happens. The returned 
line position is "as close as possible" to the original line position.



> - In the moki, the introduction of the "line" attribute in HTML elements 
> triggers the definition of a "ns0" prefix for the moki namespace defined 
> in the HTML root, e.g.:
>  <html xmlns="http://www.w3.org/1999/xhtml" lang="en" 
> xmlns:ns0="http://www.w3.org/2007/05/moki" ns0:line="2">
> That's technically correct, alghough visually ugly. I would have 
> preferred to control the serialization and generate a "moki" (or "m") 
> prefix, but I could not figure out any easy way to do that in Java.

Having run into weird namespaces issues when the DOM tree was 
serialized, I eventually resorted to the use of an additional XSL 
stylesheet that forces the use of an "m" prefix.



Note that I checked and re-generated the whole test suite.
Let me know if you find anything strange.

Francois.


> 
> 
> Related "bugs"
> -----
> 5006: Does a "tidied" element or attribute exist?
> 6962: Code extracts: closing tag and tag content are often useless
> 9538: Improve code references
> 9583: Return code position consistently across the tests that output 
> code extracts
> These bugs are visible using:
> http://www.w3.org/Bugs/Public/show_bug.cgi
> 
> 
> Francois.
> 
> 

Received on Monday, 3 May 2010 10:27:29 UTC