Re: HTML and Search Engines

J.Wallis (cm1906@scitsc.wlv.ac.uk)
Fri, 25 Aug 1995 08:46:22 +0100 (BST)


Message-Id: <m0sltT4-0007lxC@scitsc.wlv.ac.uk>
From: cm1906@scitsc.wlv.ac.uk (J.Wallis)
Subject: Re: HTML and Search Engines
To: murray.altheim@nttc.edu
Date: Fri, 25 Aug 1995 08:46:22 +0100 (BST)
Cc: www-html@w3.org
In-Reply-To: <v02110106ac628d9cb142@[192.188.119.193]> from "murray.altheim@nttc.edu" at Aug 24, 95 04:38:31 pm

> A common way of expressing a set of keywords or other searchable info is
> with the META element. Here's what I consider a good example from one of my
> users:[1]
> 
>    <META name="resource-type" content=document>
>    <META name="description" content="The ArtMetal Project is a volunteer
[snip] 
>    <META name="keywords" content="art, metal, sculpture, furniture,
>    jewelry, casting, lighting, designer, gallery, blacksmith, artist,
>    architect, iron, forging">
> 
>    <META name="distribution" content=global>
> 
> I've seen this type of implementation used on a number of search engines.
> As META is extensible and HTML has no other *explicitly stated* method of
> including document meta-information, this seems the most likely method,
> although others are feasible. This is at least a true HTML 2.0 compatible
> (as per the latest draft) method.
> [1] ArtMetal Home Page: http://wuarchive.wustl.edu/edu/arts/metal/ArtMetal.html

This would also be a way of including some more formal method
of document classification, such as Dewey Decimal or Universal Decimal.

e.g.,  <META name="class" content="667">
or     <META name="DDC" content="796.1"> 
as it would almost certainly be necessary to specify the classification system 
in use (both to enable the right sort of search and to allow translation
between classificaton systems).

Search engines could then catalogue pages by *context* (ie. subject) as 
well as by content.

The nice thing about some sort of classification system is that it's both more
abstract  and more precise than using simple keywords (which can be *very* 
ambiguous).  It can also help resolve spelling differences - e.g., you
put the keyword "color" in your Meta tag, but I search for "colour" - this 
could be  a problem with a text search, but in DDC they're both classmark 667.
Also, differences in terminology can be solved - you may say "automobile" and I
may look for "car" (problem!), but in DDC they may both be classified as 796.7 

Unlike "real" physical libraries, which can only classify a book under
one classmark (or else have two copies, both under different classmarks),
on the Web you can cross-classify a document as much as is useful.

For an example of this sort of system (though without the META tag part, as
no-one is doing this sort of thing with META yet) please take a look 
at:

     <URL:http://www.scit.wlv.ac.uk/wwlib>

which is a classified catalogue of c.2000 UK web sites.
(NB - it will be out of action from 18.00 BST on 25 Aug to midnight on 
28 Aug, due to major electrical work at our site).

Comments and constructive criticism are very welcome.

-- 
Jon Wallis
------------------------------------------------
The School of Computing & Information Technology
University of Wolverhampton
Wulfruna Street
Wolverhampton	             Tel : (0902) 322203
WV1 1SB	                     Fax : (0902) 322680
UK                Internet : jw@scitsc.wlv.ac.uk
------------------------------------------------
N.B. Opinions are mine and not the University's!
------------------------------------------------