- From: Michele Bassan <michele@pdigi3.igi.pd.cnr.it>
- Date: Wed, 13 Dec 1995 10:47:19 +0100
- To: www-html@w3.org
Dear colleagues, I would like to contribute some ideas to the HTML community. I just subscribed to this list, following a suggestion of an eminent contributor to the HTML development, and I hope I'm posting my comments in the correct environment. If not so, please anyone tell me. In the following please find a list of problems I identified and the solutions I'm proposing. Problems identified: A often a document content is outdated, and this could have been known at the very moment of the document production. B often a document is moved to another location, and the only way to know that it was moved is to read some warning message that the mover was so kind to leave, if we are lucky also adding a link to the new location; this does happen, but not so often. Consequences: A 1 bandwidth is lost to transfer the outdated document 2 human time is lost retrieving the document and discovering, often after reading through part of it, that it is no longer valid/useful 3 the databases of documents will grow only adding to old information and not just replace it with new data 4 the database replies to the searches will be (they are already) increasingly unmanageable, despite any restrictive condition one can imagine to apply 5 (second level consequence) the validity of the net for conveniently retrieving reliable information can be questioned. B 1 valuable human time is lost to manually follow broken links, jumping around the net 2 database replies will keep giving for long time incorrect document locations With some thinling more bad effects can be added to this list, but probably I made the idea already clear. Maybe (I'm not a Web techie, I write this just based on common sense) the documents aging problem has already been addressed by the various Web crawlers, repeatedly checking for what has been thrown away, and for what is new, but I still think that such an approach is not the right solution. Proposed solutions: Prefax All the aging and location information I'm writing about is an information about the document itself. Therefore the correct location is within the META elements. I am personally interested, as a Web pages provider, to fix these problems, or find out if and how they have already been fixed. A 1 The document has a single definite expiry date, its contents are uselesss in any following date. I'm proposing to use the following sintax: <META NAME="EXPIRY" CONTENT="DD MMM YYYY"> With this information document databases have the possibility to perform the following actions: - replying to a database search before the expiry date, they can display its expiry date together with the relevant document info - trash any document info after the expiry date - avoid adding to the database any newly discovered document already expired Web browsers can highlight this information for the user (together with the title?) A 2 The document has some information with finite lifetime, but will likely be updated at some later time, (e.g. the program of a theatre). I'm proposing to use the following sintax: <META NAME="NEXT_UPDATE" CONTENT="DD MMM YYYY"> With this information document databases have the possibility to perform the following actions: - replying to a database search before the expiry date, they can display the next update date together with the relevant document info - retrieve again the document info after the update date (e.g the index words might have changed, because also Puccini has been added to the theatre program) Web browsers can highlight this information for the user (together with the title?) Of course the writer is not forced to provide this info, BTW using these META tags he will be sure that the document info will be always regularly updated and right in time, with no indefinite delays. B 1 A new small 'placeholder' document shall replace the old one, clearly with the same name. I'm proposing to use the following sintax: <META NAME="MOVED" CONTENT="http:etc etc"> Instead of "http:...", 'ftp:..." or whatever applicable can be used. With this information document databases have the possibility to perform the following action: - update the document links Web browsers can automatically follow the new link (highlighting the evet to the user?), and if the old link was also a bookmark, update it. If a document will presumably be moved at some time in the future, the association of an UPDATE attribute in the original document and of a MOVED attribute in the 'placeholder' document will provide for an immediate update of database pointers after the date indicated. Final remarks I believe that the http servers should also provide two tables for the crawlers, keeping trace with a daily schedule of what is appearing and disappearing from the site (do they do that already?). This will at least filter out some noise related to changes in old documents or in documents not following this reccommendation. Still, these tables will not be able to suggest the removal of outdated information, when a file remains 'forgotten' in the site, and they will not guarantee a 'right in time' update of the databases when the scheduled document udpate takes place. What if the update does not take place as scheduled? Well, I can generate more ideas to trap oddities and codify the behaviour, but I think I've already written too much. If this thread will go on, everything will eventually be ironed out. Please consider that if now these problems and their consequences are annoying but still manageable, since the number of Web documents is growing 'exponentially' we are going to face an incredible amount of junk information. Thanks for your attention, yours faithfully, Michele Bassan Via XXIV maggio, 10 35010 Vigonza - Padova - Italy michele@pdigi3.igi.pd.cnr.it i3@intercity.shiny.it http://intercity.shiny.it/i3
Received on Wednesday, 13 December 1995 04:48:50 UTC