Re: 3 points about Text free (<30 words) sites

> img src ="http://www....

I've heard some sites say don't do it (it's not linking, but inlining).  
Others are only too keen.  Best thing is to write to them or see if they 
have a policy on their web page I suppose.  NB copyright is a tricky 
thing on the Internet because it's international.  For example, there 
are differences between Britain and America.

> 3   word counts in meta tags.

Do you mean word counts in links?  If you've got the page (and you need 
the page to have the meta tags), you can run a wordcount anyway.  I can 
also just imagine that these things will suffer from "bit rot" - people 
might put the word counts in but fail to keep them up-to-date.

> my vb spider had to stop counting somewhere

You wrote your own web crawler in VB to find text-free sites?  I'm not 
surprised it went wrong - the web is *very* big and it's not within the 
resources of most of us to run a crawler.

InfoSeek, AltaVista and Lycos tell you the document size (without 
images) with the search results; this isn't a good indicator but it 
might help (eg. a page that's 100k long will probably contain too much 
text).  You could write to AltaVista [1] and ask if they could add a 
feature that tells you the word count of fetched documents (or indeed 
search for documents with less than a certain word count [2]) - they 
seem pretty keen to implement all kinds of things (CJK, translation 
etc).

[1] Do it as an organisation not an individual.  Large companies often 
pay attention to organisations with important-looking acronyms, but 
don't pay attention to individuals.  Writing on paper will probably help 
too.

[2] but you'll probably have problems with frames - you might get quite 
a few hits to frames that just say "Back to home page"....

Regards

-- Silas S Brown, St John's College Cambridge UK http://epona.ucam.org/~ssb22/

"Anyone inexperienced puts faith in every word, but the shrewd one
 considers his steps" - Proverbs 13:16

Received on Sunday, 14 March 1999 03:51:45 UTC