- From: Karl-Otto Linn <linn@informatik.fh-wiesbaden.de>
- Date: Fri, 20 Aug 1999 22:02:09 +0200
- To: www-lib@w3.org
Hi, I have found a similar problem with the robot. If a HTML page contains comments of the form "<!--- ....--->" which are used by many authoring tools, the robot stops parsing this page at the point the comment begins. It does not resynchronize again. Is this a bug or a feature? mfg Karl-Otto Raffaele Sena wrote: > > > > In this function ... e.g.: > > > > PRIVATE void unparsedBeginElement (HText* pDataStruct, const char* > > cpszBuffer, int iLength) > > { > > HTPrint("\n\nFound a unparsed Element -> [%d]*%s*\n", iLength, cpszBuffer); > > } > > > > ... i only receive the unknown tag in cpszBuffer and its length in iLength. > > But the rest of this tag, its parameter ... How may I access them ? > > > > Hope u can help me ;) > > > unfortunately there is no way without changing libwww. > > I noticed that some time ago, but there isn't an easy fix. > > The way the SGML parser works today is that when it first find a tag it > checks > if it's valid, before parsing the attributes. If not, it will call the > unparsed_begin_element > with no attributes, and then throw them away. > > I guess it could be changed to in a state where it collects everything up > to the end tag > and then call the appropriate callback (but then you'll have to parse the > full line). > > A better way could be to collect the attributes without checking them and > passing them > to the callback in the attributes array, maybe in the form > "ATTRNAME=VALUE" that should > be easy to parse. > > ...but whatever way, it needs to be implemented. > > In your specific case you may want to add the <EMBED> tag to HTMLPDTD.c > and HTMLPDTD.h > (just put it in the right place :) > > -- Raffaele
Received on Friday, 20 August 1999 15:59:12 UTC