RE: JTidy: Pb with enclosed tag A

I have to give some lights about the context of my work.

The parser is used in the goal of extracting data from HTML pages in order
to redisplay it in another form (with the permission of the web site author
of course). 

So I'd like the parser to be as universal as possible. I don't want to tell
the webmaster to modified here and here its site.

So the probleme is to finish a start tag when a < comes:

<A att1 att2 <a> kjgkjg</a>

becomes:
<A><a>kugkg</a></A> or
 
<A></A><a>kugkg</a> or 

<a att1 att2>kugkg</a> (ignoring first A, but not its attributes)

I have also the problem with:
<A> <IMG lkjlkj </A>

In this case the IMG must be closed, the error disappear when I use
setMakeClean(false).

Is there a place in the source code I can take a look ?

about:

>How about a test to sees if the tag missing the ">" is the same tag as
>mentioned after the "<" that triggered the error and, if the match, combine
>the tags.  The browser in the example is going to ignore the "<a" and treat
>them as concatenated with an ignored unknown attribute anyway.

I not sure : the browser is Netscape for example ?


Denis Queffeulou

Received on Friday, 19 May 2000 10:35:01 UTC