Indexing the Web (was Re: What's wrong with <FONT>?)

T. Joseph W. Lazio (
Fri, 10 May 1996 14:05:02 -0400

From: "T. Joseph W. Lazio" <>
Date: Fri, 10 May 1996 14:05:02 -0400
Message-Id: <>
In-Reply-To: <> (
Subject: Indexing the Web (was Re: What's wrong with <FONT>?)

>>>>> "SEP" == Scott E Preece <> writes:

SEP> First, the danger of FONT lies not in what it does, but in how it
SEP> is used. [...]

 Among the uses of FONT is the following (to pick a random example):

<H1><FONT SIZE="+1">F</FONT>ortran</H1>

It's quite legit, passes the KGV test.

 How is it supposed to be indexed?  This really is a question of
ignorance.  I just poked around Lycos and AltaVista (to pick just two
search engines) and I saw them exclaim that they did index the Web,
but no real description of how they do it.

 I suppose it just means adding an additional conditional to one's
indexer, something like 

 if next character after </FONT> != whitespace
    ignore <FONT> and </FONT>, index as normal
    append all non whitespace characters following </FONT> to last
       character of stuff between <FONT> and </FONT>
    now index

 Anybody working on a search engine?  How do you plan to handle stuff
like this?

-- Joseph