- From: Thomas Hedden <thomas@hedden.org>
- Date: Fri, 31 Dec 2004 06:00:20 +0000
- To: www-html@w3.org
I have spent some time trying to find whether the following topic has been discussed, and have not been able to find anything about it. However, I am new to this list, so if this is old hat please excuse me. I have always thought that there should be some way of tagging words, phrases, sentences, graphics (actually anything) with an indexing tag that can be used to generate a proper index. This is distinct from META data, since META data is in the header, and can only be used to find WEB PAGES, not individual parts of web pages, while what I have in mind would be in tags embedded in the text: "inline" indexing tags, if you will. Here is an example of what I have in mind. This is not very well thought out, and I don't really know the spec, so if someone has a better idea, all the better. <index level="1" term="Rocky and Bullwinkle", term="Bullwinkle the Moose"; level="2" term="Bullwinkle the Moose">Bullwinkle</index> <index level="1" term="Rocky and Bullwinkle", term="Rocky the Flying Squirrel"; level="2" term="Rocky the Flying Squirrel">Rocky the Flying Squirrel</index> (Another thing which needs to be done is to specify the level 1 term under which a level 2 term should appear, and I'm having trouble thinking of the best way to do that right now.) A program could be run on the markup page to generate an index that would look something like this: Bullwinkle the Moose Rocky and Bullwinkle Bullwinkle the Moose Rocky the Flying Squirrel Rocky the Flying Squirrel There could be defaults to make it simpler to write the tags, for example if no term is specified then the term would default to the tagged word/phrase, etc., the default level would be "1", etc. IMHO, the entire w3 community hasn't paid proper attention to indexing for the simple reason that whole text searching is now free, very quick, and is adequate for many purposes. However, after we get over the initial euphoria of being able to perform whole-text searching, we should realize that at the end of the day it's really not very good: It requires searching for synonyms, since one author might use one term and another author might use a synonym, and it finds all manner of unrelated rubbish. Not only that, but a particular passage might be of interest for a certain topic even if it does not contain the term under which it should properly be indexed. Making a proper index takes time and effort on the part of a human indexer, and to facilitate this I think a tag should be made available which authors can use, or if they are not inclined to do this, one which an indexer could go back and add later, with the goal of generating a GOOD-QUALITY index. This would be very simple to do once the content is properly tagged. Of course anyone is free to do this on his/her own, but it would only be really useful if there was some standardization so that true indexing engines could produce true indexes, rather than having whole text search engines give us everything including the kitchen sink. Thank you for your time. Thomas Hedden -- -------------------------------------------------------------- | Thomas Hedden | Voice & fax +1 (978) 371-2126 | | 98 East Riding Drive | Skype callto://thomashedden | | Carlisle, MA 01741-1602 | Cell +1 (978) 930-0462 | | U.S.A. | E-mail thomas AT hedden DOT org | | Planet Earth | WWW http://www.hedden.org | -------------------------------------------------------------- | Linux Counter registration # 203894, http://counter.li.org | --------------------------------------------------------------
Received on Thursday, 6 January 2005 22:34:25 UTC