Re: Problem with <SPAN> within <A>, and other issues

bglbv wrote:
>
> There is a bug in the current CVS version of Amaya involving the nesting of
> <SPAN> within <A>. The document
> 	http://www.w3.org/TR/1999/REC-html401-19991224/
> illustrates the problem. When viewed with Amaya, some entries in the table
> of contents (e.g., 7.3, 7.4.1, 7.4.2, 7.4.3, 7.4.4.second_bullet, 7.5.1,
> 7.5.2, and many more) are incorrectly rendered as if <SPAN> implied a </A>.

Thanks for your report.  It helped us to fix the bug.
If you update module amaya/html2thot.c from the CVS base,
the table of content of HTML 4.01 will display correctly.

> Interestingly, although the structure view confirms that this is what happens,
> the error log doesn't complain about the real </A> when they appear shortly
> afterwards, at the intended end of the anchor text.

There is no error in the HTML file, so no need to complain.

> (Generally speaking, I find that the error log fails to report a number
> of anomalies in documents. I would prefer for it to be more pedantic
> about HTML correctness, even if Amaya then attempts to transform the
> document into something sensible.)

Sure, but it's a matter of priority.  Validation is already available
through other tools, such as the W3C validator.  Amaya only tries to
detect and fix errors that may pose problems at editing time.

> I can cut-and-paste things back into their rightful state through the
> structure view, so I suspect that the bug is in html2thot rather than
> deeper within Thotlib.

You are right.

> ---------
> On a different note, I've experienced some "Thot: irrecoverable error" crashes
> while editing the source view of some ill-formed documents. (I find it useful
> to do so whenever html2thot doesn't handle the input in the way I would prefer;
> hence my interest in the source view. In an ideal world it wouldn't be needed,
> but we don't live in such a world.) Debugging hints will be most welcome.
> 
> I'm amazed at the scarcity of assertion checks in Amaya source code. I believe
> that systematic use of assert.h's facilities would lead to much more robust
> code. There would be documentation benefits as well.
> 
> ---------
> The response to my earlier report of problems with unclosed anchors before
> <BODY> has been an incomplete fix. If the anchor is closed properly with </A>,
> everything is fine, but if the </A> is missing html2thot appears to construct
> an internal tree with two nested BODY elements (and Amaya will then publish
> a page with nested bodies on request). That should never be allowed to happen. 
> As an example, try playing with
> 
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
>                       "http://www.w3.org/TR/REC-html40/loose.dtd">
> <html>
> <a name="Top"><-- insert a /A element here and all will be well -->
> <head>
>   <title>Out-of-body anchor</title>
> </head>
> <body bgcolor="green">
> </body>
> </html>
> 
> (Also note that the structure view gets garbled; but perhaps that isn't worth
> fixing if it only ever happens with ill-formed internal document trees: 
> better to make sure that such trees cannot occur.)

I guess "<-- insert..." is supposed to be a comment.  Adding a '!' after
'<' improves the situation, at least in the structure view, but it does
not magically solves the nested bodies problem!  We will look into this.

> All this was already happening in the Amaya V2.4 release, and my original
> report had intended to draw attention to this ill-behaviour. Instead, I was
> disappointed to see people focus on the request for support of out-of-body
> anchors, which in my opinion is far less important than the fact that Amaya
> sometimes fails to live up to the rule that the thotlib representation of
> the document should always be well-formed.

Amaya claims to *create* well-formed and valid documents.  It is also
supposed to add well-formed and valid *new* stuff.  But it only *tries*
to recover from some usual html errors.  Other tools are provided by
W3C to validate and fix invalid documents.  This is not a priority for
Amaya.

> To dispel possible misunderstandings, I'll say that what would make Amaya most
> useful to me isn't lots of bells and whistles or insightful do-what-I-mean
> rendering of all the bad HTML in the world, but (1) strict adherence to the
> principle of only storing correct HTML internally,

Only for stuff generated by Amaya, unfortunately.

> (2) faithful display and
> robust editing of documents through all relevant views (including the source
> view, which is just a plaintext editor),

The source view behaves differently on purpose. For some manipulations
strict structure editing is not very convenient and editing the source
as plain text, without any constraint is much more efficient. This feature
has been added recently and we still need to improve it.  Checking validity
when the user synchronizes the source view, for instance, would be nice.

> (3) thorough diagnostics of bad HTML
> input. (Take the name HTML to include XHTML, MathML etc. as appropriate.)
> Naturally I only speak for myself, and W3C and the Amaya developers may well
> have different priorities and preferences; my remarks are for information only.
> 
> That said, life would be simpler with a more easily user-configurable html2thot,
> in which one could fine-tune the handling of non-conforming HTML input. But as
> long as editing the source view works properly and html2thot produces useful
> diagnostics to guide me, I can live with patching up the input. If the glitches
> I see in this area could be removed, this would make Amaya a great tool for
> fixing bad legacy HTML pages.

Have you tried HTML tidy?  That's exactly its purpose.
Please refer to

   http://www.w3.org/People/Raggett/tidy/

Thanks for your comments.

Vincent.

-------------------------------------------------------
Vincent Quint                       INRIA Rhone-Alpes
W3C/INRIA                           ZIRST
e-mail: Vincent.Quint@w3.org        655 avenue de l'Europe
Tel.: +33 4 76 61 53 62             38330 Montbonnot St Martin
Fax:  +33 4 76 61 52 07             France

Received on Tuesday, 4 January 2000 09:36:46 UTC