Indexing the Web (was Re: What's wrong with <FONT>?)

>>>>> "SEP" == Scott E Preece <preece@predator.urbana.mcd.mot.com> writes:

SEP> First, the danger of FONT lies not in what it does, but in how it
SEP> is used. [...]

 Among the uses of FONT is the following (to pick a random example):

<H1><FONT SIZE="+1">F</FONT>ortran</H1>

It's quite legit, passes the KGV test.

 How is it supposed to be indexed?  This really is a question of
ignorance.  I just poked around Lycos and AltaVista (to pick just two
search engines) and I saw them exclaim that they did index the Web,
but no real description of how they do it.

 I suppose it just means adding an additional conditional to one's
indexer, something like 

 if next character after </FONT> != whitespace
 then 
    ignore <FONT> and </FONT>, index as normal
 else
    append all non whitespace characters following </FONT> to last
       character of stuff between <FONT> and </FONT>
    now index
 endif


 Anybody working on a search engine?  How do you plan to handle stuff
like this?

-- Joseph

Received on Friday, 10 May 1996 14:05:49 UTC