- From: Gabriele Bartolini <me@gabrielebartolini.it>
- Date: Thu, 05 May 2005 12:27:03 +0200
- To: Nick Kew <nick@webthing.com>
- CC: public-wai-ert@w3.org
Hi Nick,
thanks for your great contribution. I find the normalisation process
extremely useful in some cases, but IMHO not practical in some others. I
hope you can change my idea, in the very likely case I have not fully
understood your arguments.
I don't know if you have already discussed about this. If so, I
apologise for that; however, I could not find any tracks on the threads
that were recently posted on the list. I also apologise for the length
of this e-mail, but I swear to you guys it is very fast to read.
Normalisation "somehow" changes the original content of an SGML/XML
document. I don't want to state the obvious, but normalisation is a
one-way process and going back from a normalised document to its
original is very hard (unless all the changes are stored).
This process could therefore affect the localisation of the subject
of an assertion. Especially when we assert something like "an element is
missing".
I want to clarify this with an example, and I hope we can discuss
about it.
Let's suppose my aim is to deploy a statistics regarding the usage
of the "tbody" element in a collection of HTML documents on the net. I
want to use EARL to write a report with assertions of all the documents
that have been fetched and checked and the results (and maybe repeat it
every quarter of a year).
If original documents do specify "tbody", I guess the normalisation
process produces a structure which would not affect the localisation of
the subject of my assertion.
On the other hand, if we consider this document portion:
[...]
<table>
<tr>
<th>Country</th>
<th>Population</th>
</tr>
<tr>
<td>Italy</td>
<td>57 millions (?)</td>
</tr>
[...]
my question is. Would the normalisation process introduce the
following change or not?
[...]
<tbody>
<tr>
<td>Italy</td>
<td>57 millions (?)</td>
</tr>
[...]
</tbody>
If it does, I think, there could be problems when trying to locate
the missing tbody on a document that's been normalised: indeed, the
tbody actually exists, as it has been artificially added.
My question is: how would you locate this kind of problem using the
normalised document? Are you still able to refer to the problem in the
original document using a fuzzy pointer or Xpath expression (which are
related to the normalised document)?
Thank you for your attention.
Ciao,
-Gabriele
Nick Kew ha scritto:
> Jim has given us very briefly his take on the normalisation problem.
>
> FWIW, there's a piece on the subject by Joe English at
> http://groups-beta.google.com/group/comp.text.sgml/msg/70ec0496587b03bb
> taken from an SGML viewpoint. He doesn't make any reference to HTML
> as such, but puts forward general rules. His analysis supports the
> view that <tbody> elements (along with the usual suspects <html>,
> <head>, <body>) should be inserted into the document tree where there
> is ambiguity.
>
--
Gabriele Bartolini: Web Programmer, IWA/HWG Member
ht://Check, ht://Miner and Wuhkag maintainer
Current Location: Prato, Toscana, Italia
me@gabrielebartolini.it | www.gabrielebartolini.it
> "Lasciate ogne speranza, voi ch'intrate", Dante Alighieri, Divina
Commedia, Inferno
Received on Thursday, 5 May 2005 10:27:27 UTC