- From: Edward Lass <elass@goer.state.ny.us>
- Date: Tue, 11 Jan 2005 14:32:01 -0500
- To: <thomas@hedden.org>,<www-html@w3.org>
You could store the information in an XML document and use XSLT[1] to output it in (X)HTML. Using XML this way is very similar to a database-driven website, but arguably more flexible. The XML document might look something like this: <show> <title>Rocky and Bullwinkle</title> <character> <name>Bullwinkle the Moose</name> ...insert any other information here... </character> <character> <name>Rocky the Squirrel</name> ...ditto... </character> </show> These sort of data could be transformed to make all the show titles into headers and then all the character names into list items in an unordered list. And then each character could get his or her own page with the name as a header and the other information in paragraphs. This is good for content management, especially reusing and indexing objects within a particular site. Search engines, however, obviously wouldn't see what's going on behind the scenes. You could also transform the data into RDF/XML syntax[2] and attach it to the (X)HTML document as follows: <link rel="alternate" type="application/rdf+xml" href="...URL here..." /> This could help the search engines, but major search engines don't index RDF data in any meaningful way, or at least not yet. In general, Thomas, I would suggest reading up on the W3C's efforts for a so-called Semantic Web[3]. Ed. [1] http://www.w3.org/TR/xslt [2] http://www.w3.org/TR/rdf-syntax-grammar [3] http://www.w3.org/2001/sw/ >>> Thomas Hedden <thomas@hedden.org> 12/31/2004 1:00:20 AM >>> I have spent some time trying to find whether the following topic has been discussed, and have not been able to find anything about it. However, I am new to this list, so if this is old hat please excuse me. I have always thought that there should be some way of tagging words, phrases, sentences, graphics (actually anything) with an indexing tag that can be used to generate a proper index. This is distinct from META data, since META data is in the header, and can only be used to find WEB PAGES, not individual parts of web pages, while what I have in mind would be in tags embedded in the text: "inline" indexing tags, if you will. Here is an example of what I have in mind. This is not very well thought out, and I don't really know the spec, so if someone has a better idea, all the better. <index level="1" term="Rocky and Bullwinkle", term="Bullwinkle the Moose"; level="2" term="Bullwinkle the Moose">Bullwinkle</index> <index level="1" term="Rocky and Bullwinkle", term="Rocky the Flying Squirrel"; level="2" term="Rocky the Flying Squirrel">Rocky the Flying Squirrel</index> (Another thing which needs to be done is to specify the level 1 term under which a level 2 term should appear, and I'm having trouble thinking of the best way to do that right now.) A program could be run on the markup page to generate an index that would look something like this: Bullwinkle the Moose Rocky and Bullwinkle Bullwinkle the Moose Rocky the Flying Squirrel Rocky the Flying Squirrel There could be defaults to make it simpler to write the tags, for example if no term is specified then the term would default to the tagged word/phrase, etc., the default level would be "1", etc. IMHO, the entire w3 community hasn't paid proper attention to indexing for the simple reason that whole text searching is now free, very quick, and is adequate for many purposes. However, after we get over the initial euphoria of being able to perform whole-text searching, we should realize that at the end of the day it's really not very good: It requires searching for synonyms, since one author might use one term and another author might use a synonym, and it finds all manner of unrelated rubbish. Not only that, but a particular passage might be of interest for a certain topic even if it does not contain the term under which it should properly be indexed. Making a proper index takes time and effort on the part of a human indexer, and to facilitate this I think a tag should be made available which authors can use, or if they are not inclined to do this, one which an indexer could go back and add later, with the goal of generating a GOOD-QUALITY index. This would be very simple to do once the content is properly tagged. Of course anyone is free to do this on his/her own, but it would only be really useful if there was some standardization so that true indexing engines could produce true indexes, rather than having whole text search engines give us everything including the kitchen sink. Thank you for your time. Thomas Hedden -- -------------------------------------------------------------- | Thomas Hedden | Voice & fax +1 (978) 371-2126 | | 98 East Riding Drive | Skype callto://thomashedden | | Carlisle, MA 01741-1602 | Cell +1 (978) 930-0462 | | U.S.A. | E-mail thomas AT hedden DOT org | | Planet Earth | WWW http://www.hedden.org | -------------------------------------------------------------- | Linux Counter registration # 203894, http://counter.li.org | -------------------------------------------------------------- This message has been scanned by the NYS GOER WebShield.
Received on Tuesday, 11 January 2005 19:32:59 UTC