- From: Christophe Strobbe <christophe.strobbe@esat.kuleuven.ac.be>
- Date: Thu, 05 May 2005 15:56:26 +0200
- To: public-wai-ert@w3.org
I sent this mail earlier to Gabriele, but it was meant to go to the list. -- Hi Gabriele, At 12:27 5/05/2005, you wrote: >Hi Nick, > >(...) > Normalisation "somehow" changes the original content of an SGML/XML document. I don't want to state the obvious, but normalisation is a one-way process and going back from a normalised document to its original is very hard (unless all the changes are stored). I agree. > This process could therefore affect the localisation of the subject of an assertion. Especially when we assert something like "an element is missing". > > I want to clarify this with an example, and I hope we can discuss about it. > > Let's suppose my aim is to deploy a statistics regarding the usage of the "tbody" element in a collection of HTML documents on the net. I want to use EARL to write a report with assertions of all the documents that have been fetched and checked and the results (and maybe repeat it every quarter of a year). In fact, every HTML table has a "tbody" element: tbody is one of those elements that have optional start and end tags (like "html", "head" and "body"; an SGML parser is expected to infer there presence). I guess I'm a nitpicker, but I would rephrase your statement as "statistics regarding the use of the 'tbody' start and end tags". > If original documents do specify "tbody", I guess the normalisation process produces a structure which would not affect the localisation of the subject of my assertion. > > On the other hand, if we consider this document portion: > >[...] ><table> ><tr> ><th>Country</th> ><th>Population</th> ></tr> ><tr> ><td>Italy</td> ><td>57 millions (?)</td> ></tr> >[...] > > my question is. Would the normalisation process introduce the following change or not? > >[...] ><tbody> ><tr> ><td>Italy</td> ><td>57 millions (?)</td> ></tr> >[...] ></tbody> The content models for "tbody", "thead" and "tfoot" are identical, namely "(tr)+", so an SGML parser may actually generate a hierarchy that corresponds to this: [...] <table> <tbody> <tr> <th>Country</th> <th>Population</th> </tr> <tr> <td>Italy</td> <td>57 millions (?)</td> </tr> [...] </tbody> </table> More intelligent software might produce this: [...] <table> <thead> <tr> <th>Country</th> <th>Population</th> </tr> </thead> <tbody> <tr> <td>Italy</td> <td>57 millions (?)</td> </tr> [...] </tbody> </table> (I am not aware of software that does this.) > If it does, I think, there could be problems when trying to locate the missing tbody on a document that's been normalised: indeed, the tbody actually exists, as it has been artificially added. Not "added" but made explicit. Regards, Christophe Strobbe > My question is: how would you locate this kind of problem using the normalised document? Are you still able to refer to the problem in the original document using a fuzzy pointer or Xpath expression (which are related to the normalised document)? > > Thank you for your attention. > >Ciao, >-Gabriele > >Nick Kew ha scritto: >>Jim has given us very briefly his take on the normalisation problem. >>FWIW, there's a piece on the subject by Joe English at >>http://groups-beta.google.com/group/comp.text.sgml/msg/70ec0496587b03bb >>taken from an SGML viewpoint. He doesn't make any reference to HTML >>as such, but puts forward general rules. His analysis supports the >>view that <tbody> elements (along with the usual suspects <html>, >><head>, <body>) should be inserted into the document tree where there >>is ambiguity. > >-- >Gabriele Bartolini: Web Programmer, IWA/HWG Member >ht://Check, ht://Miner and Wuhkag maintainer >Current Location: Prato, Toscana, Italia >me@gabrielebartolini.it | www.gabrielebartolini.it >> "Lasciate ogne speranza, voi ch'intrate", Dante Alighieri, Divina Commedia, Inferno -- Christophe Strobbe K.U.Leuven - Departement of Electrical Engineering - Research Group on Document Architectures Kasteelpark Arenberg 10 - 3001 Leuven-Heverlee - BELGIUM tel: +32 16 32 85 51 http://www.docarch.be/
Received on Thursday, 5 May 2005 13:58:44 UTC