Problem with <SPAN> within <A>, and other issues

There is a bug in the current CVS version of Amaya involving the nesting of
<SPAN> within <A>. The document
	http://www.w3.org/TR/1999/REC-html401-19991224/
illustrates the problem. When viewed with Amaya, some entries in the table
of contents (e.g., 7.3, 7.4.1, 7.4.2, 7.4.3, 7.4.4.second_bullet, 7.5.1,
7.5.2, and many more) are incorrectly rendered as if <SPAN> implied a </A>.is a bug in the current CVS version of Amaya involving the nesting of
<SPAN> within <A>. The document
	http://www.w3.org/TR/1999/REC-html401-19991224/
illustrates the problem. When viewed with Amaya, some entries in the table
of contents (e.g., 7.3, 7.4.1, 7.4.2, 7.4.3, 7.4.4.second_bullet, 7.5.1,
7.5.2, and many more) are incorrectly rendered as if <SPAN> implied a </A>.

Interestingly, although the structure view confirms that this is what happens,
the error log doesn't complain about the real </A> when they appear shortly
afterwards, at the intended end of the anchor text. (Generally speaking, I
find that the error log fails to report a number of anomalies in documents.
I would prefer for it to be more pedantic about HTML correctness, even if Amaya
then attempts to transform the document into something sensible.)

I can cut-and-paste things back into their rightful state through the structure
view, so I suspect that the bug is in html2thot rather than deeper within
Thotlib.

---------
On a different note, I've experienced some "Thot: irrecoverable error" crashes
while editing the source view of some ill-formed documents. (I find it useful
to do so whenever html2thot doesn't handle the input in the way I would prefer;
hence my interest in the source view. In an ideal world it wouldn't be needed,
but we don't live in such a world.) Debugging hints will be most welcome.

I'm amazed at the scarcity of assertion checks in Amaya source code. I believe
that systematic use of assert.h's facilities would lead to much more robust
code. There would be documentation benefits as well.

---------
The response to my earlier report of problems with unclosed anchors before
<BODY> has been an incomplete fix. If the anchor is closed properly with </A>,
everything is fine, but if the </A> is missing html2thot appears to construct
an internal tree with two nested BODY elements (and Amaya will then publish
a page with nested bodies on request). That should never be allowed to happen. 
As an example, try playing with

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
                      "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<a name="Top"><-- insert a /A element here and all will be well -->
<head>
  <title>Out-of-body anchor</title>
</head>
<body bgcolor="green">
</body>
</html>

(Also note that the structure view gets garbled; but perhaps that isn't worth
fixing if it only ever happens with ill-formed internal document trees: 
better to make sure that such trees cannot occur.)

All this was already happening in the Amaya V2.4 release, and my original
report had intended to draw attention to this ill-behaviour. Instead, I was
disappointed to see people focus on the request for support of out-of-body
anchors, which in my opinion is far less important than the fact that Amaya
sometimes fails to live up to the rule that the thotlib representation of
the document should always be well-formed.

To dispel possible misunderstandings, I'll say that what would make Amaya most
useful to me isn't lots of bells and whistles or insightful do-what-I-mean
rendering of all the bad HTML in the world, but (1) strict adherence to the
principle of only storing correct HTML internally, (2) faithful display and
robust editing of documents through all relevant views (including the source
view, which is just a plaintext editor), (3) thorough diagnostics of bad HTML
input. (Take the name HTML to include XHTML, MathML etc. as appropriate.)
Naturally I only speak for myself, and W3C and the Amaya developers may well
have different priorities and preferences; my remarks are for information only.

That said, life would be simpler with a more easily user-configurable html2thot,
in which one could fine-tune the handling of non-conforming HTML input. But as
long as editing the source view works properly and html2thot produces useful
diagnostics to guide me, I can live with patching up the input. If the glitches
I see in this area could be removed, this would make Amaya a great tool for
fixing bad legacy HTML pages.

Received on Monday, 3 January 2000 02:16:46 UTC