Re: Fuzzy Pointers: Normalisation

Hi Nick,

    thanks for your great contribution. I find the normalisation process 
extremely useful in some cases, but IMHO not practical in some others. I 
hope you can change my idea, in the very likely case I have not fully 
understood your arguments.

    I don't know if you have already discussed about this. If so, I 
apologise for that; however, I could not find any tracks on the threads 
that were recently posted on the list. I also apologise for the length 
of this e-mail, but I swear to you guys it is very fast to read.

    Normalisation "somehow" changes the original content of an SGML/XML 
document. I don't want to state the obvious, but normalisation is a 
one-way process and going back from a normalised document to its 
original is very hard (unless all the changes are stored).

    This process could therefore affect the localisation of the subject 
of an assertion. Especially when we assert something like "an element is 
missing".

    I want to clarify this with an example, and I hope we can discuss 
about it.

    Let's suppose my aim is to deploy a statistics regarding the usage 
of the "tbody" element in a collection of HTML documents on the net. I 
want to use EARL to write a report with assertions of all the documents 
that have been fetched and checked and the results (and maybe repeat it 
every quarter of a year).

    If original documents do specify "tbody", I guess the normalisation 
process produces a structure which would not affect the localisation of 
the subject of my assertion.

    On the other hand, if we consider this document portion:

[...]
<table>
<tr>
<th>Country</th>
<th>Population</th>
</tr>
<tr>
<td>Italy</td>
<td>57 millions (?)</td>
</tr>
[...]

    my question is. Would the normalisation process introduce the 
following change or not?

[...]
<tbody>
<tr>
<td>Italy</td>
<td>57 millions (?)</td>
</tr>
[...]
</tbody>

    If it does, I think, there could be problems when trying to locate 
the missing tbody on a document that's been normalised: indeed, the 
tbody actually exists, as it has been artificially added.

    My question is: how would you locate this kind of problem using the 
normalised document? Are you still able to refer to the problem in the 
original document using a fuzzy pointer or Xpath expression (which are 
related to the normalised document)?

    Thank you for your attention.

Ciao,
-Gabriele

Nick Kew ha scritto:
> Jim has given us very briefly his take on the normalisation problem.
> 
> FWIW, there's a piece on the subject by Joe English at
> http://groups-beta.google.com/group/comp.text.sgml/msg/70ec0496587b03bb
> taken from an SGML viewpoint.  He doesn't make any reference to HTML
> as such, but puts forward general rules.  His analysis supports the
> view that <tbody> elements (along with the usual suspects <html>,
> <head>, <body>) should be inserted into the document tree where there
> is ambiguity.
> 

-- 
Gabriele Bartolini: Web Programmer, IWA/HWG Member
ht://Check, ht://Miner and Wuhkag maintainer
Current Location: Prato, Toscana, Italia
me@gabrielebartolini.it | www.gabrielebartolini.it
 > "Lasciate ogne speranza, voi ch'intrate", Dante Alighieri, Divina 
Commedia, Inferno

Received on Thursday, 5 May 2005 10:27:27 UTC