- From: zze-ethop balr102 <ethop.balr102@rd.francetelecom.fr>
- Date: Fri, 19 May 2000 16:34:36 +0200
- To: html-tidy@w3.org
I have to give some lights about the context of my work. The parser is used in the goal of extracting data from HTML pages in order to redisplay it in another form (with the permission of the web site author of course). So I'd like the parser to be as universal as possible. I don't want to tell the webmaster to modified here and here its site. So the probleme is to finish a start tag when a < comes: <A att1 att2 <a> kjgkjg</a> becomes: <A><a>kugkg</a></A> or <A></A><a>kugkg</a> or <a att1 att2>kugkg</a> (ignoring first A, but not its attributes) I have also the problem with: <A> <IMG lkjlkj </A> In this case the IMG must be closed, the error disappear when I use setMakeClean(false). Is there a place in the source code I can take a look ? about: >How about a test to sees if the tag missing the ">" is the same tag as >mentioned after the "<" that triggered the error and, if the match, combine >the tags. The browser in the example is going to ignore the "<a" and treat >them as concatenated with an ignored unknown attribute anyway. I not sure : the browser is Netscape for example ? Denis Queffeulou
Received on Friday, 19 May 2000 10:35:01 UTC