W3C home > Mailing lists > Public > www-html@w3.org > January 2005

Re: Embedded (inline) indexing tags

From: Orion Adrian <orion.adrian@gmail.com>
Date: Sat, 8 Jan 2005 10:24:16 -0500
Message-ID: <abd6c80105010807244df5748d@mail.gmail.com>
To: www-html@w3.org

What I'm saying though is that if you have to process the entire page
and not just the first X% then the process has become an order of
magnatude more resource intensive. The question is what benefit do we
get by making it that way. Secondly what do we lose by forcing
metadata to exist in the file and not be in a predictable location.
(Think meta-data based file systems).

Orion Adrian

On Sat, 8 Jan 2005 17:22:50 +1100, Trejkaz Xaoza <trejkaz@trypticon.org> wrote:
> On Sat, 8 Jan 2005 07:53, you wrote:
> > > distinct.)  It seems to me that it's the search engine's problem if it=20
> > > somehow fails to find important information.
> >
> > Often such heuristics are defences against abuse by authors trying to
> > increase their rating.  Metadata, because it doesn't get displayed in
> > HTML 4/XHTML 1, is a good place for keyword stuffing by people who don't
> > really care about its true purpose.
> Nothing stops the search engine from stopping indexing of keywords after a
> certain point in the page either.
> Although in all honesty, you would get better results if you _did_ index the
> entire page.  Then you can trivially detect keyword abuse by counting the
> number of keywords in the page and penalising for large numbers.  I thought
> this was already how Google worked anyway.
> But like I said, if they skip _important_ metadata, then it's their own
> problem.  They would quickly get supplanted by superior search engines, just
> like Altavista did when their results started getting crap.
> TX
> --
>              Email: Trejkaz Xaoza <trejkaz@trypticon.org>
>           Web site: http://xaoza.net/
>          Jabber ID: trejkaz@jabber.zim.net.au
>    GPG Fingerprint: 9EEB 97D7 8F7B 7977 F39F  A62C B8C7 BC8B 037E EA73
Received on Saturday, 8 January 2005 15:24:48 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 30 April 2020 16:20:55 UTC