Hi, I've had a quick look at document hashing comparing Mozilla 0.9.8 and IE's normalisations: http://jibbering.com/2002/3/documenthash.html contains the source of the script I used. The algorithm is basically construct a string of all names of all the nodes in document order that do not have any characters other than a-z (to remove text, comment and non HTML namespace nodes.). From that string I then calculate 2 MD5 hashes one of the whole string, and one of just the string from the body node. http://jibbering.com/2002/3/hashcompare.html has the results. Of the 9 I tested (this was just a quick test, I'll do some more if we can get Amaya to also do the tests.) the only one that failed to match on the "body hash" was http://www.ibm.com/ quite what happened with that one I'm not sure, neither found a body element, but viewing it in a browser you do have one (I proxy it through jibbering.com _without_ changing any of the src's so document.write javascript in an external file isn't generated.) So it would seem IE and Mozilla have a very similar normalisation of the body.Received on Thursday, 21 March 2002 11:41:26 GMT
This archive was generated by hypermail 2.2.0 + w3c-0.30 : Friday, 25 March 2005 11:19:18 GMT