- From: CyberWeb <web@sowebo.CHARM.NET>
- Date: Thu, 29 Sep 1994 09:26:47 -0400 (EDT)
- To: secret@www5.cern.ch
- Cc: www-vlib@www0.cern.ch
> However, I was thinking of setting up an automated tool, > that could provide an idea of the quality of a document. > > I intend to base this tool on many features: > > - the number of links in the document > - the proportion of text per link > - the number of different icons used in it > - does it provides several types of classification ? > - how many times was it accessed last month ? > - the usage of HTML tags > .. Arthur, I think the idea of quality indication is excellent! But I have strong doubts about the feasibility of automating its determination; the `quality' of a WWW document is dependent upon so many factors, in such complex ways (which I believe haven't yet been studied?), that your list above can only be scratching the surface.. I believe we should either ask you to not do this, because inaccurate quality indicators would be detrimental - or we should help you to identify those factors and their relationships to each other, and how they can be mapped to a single dimension in any useful way. Let's explore it a little.. here's my list: 1) The quality of each link. A document that points to other documents that are of low quality is poorer than one which only refers to other documents of high quality. Thus your tool has to recursively traverse the WWW, unless those other documents have quality indicators that can be trusted. 2) Indicators of quality. It will help the user select the most useful documents to explore, if they are somehow marked with quality indicators - exactly as you are proposing! But annotation could also be useful, especially if a search facility is also provided. 3) Quantity. The more the merrier.. until the user is overwhelmed. This leads into.. 4) Structure. 525 links on one page would be hard for the user to navigate unless there is a TOC at the top; but the size of the document would cause slow loading for some people - some splitting into separate files probably in a hierarchy might be better. But this structure itself is subject to quality considerations - it can be very hard to identify the most useful partitioning. For the WWW Development section I chose to model after the Usenet newsgroups: Providers, Users, and Misc. This works well most of the time.. but it may not always be evident to a user which path to traverse. This can be alleviated by providing a.. 5) Search facility. This helps the user to find subjects that cross the partitions, or aren't categorised by others exactly as you chose. 6) Relevance. This may be subsumed in the previous considerations, but is probably worth explicit mention. Your computation of quality has to take account of the quality of a link, and then reduce it if it's not very relevant. 7) Aesthetics & Ergonomics. A user will get a lot more out of a well laid out page than one which is cluttered and ugly. Small icons may help, but if there are very many then the load time is increased. -- Well, those are just a few quick ideas. I think one thing to beware of, is any assumption of linearity - e.g. you mention that you will count the number of links, icons, etc.. but I would expect there to be some optimum number for each, beyond which you overwhelm the user or impact load time, or degrade some other aspect. You have chosen an ambitious project! Alan. _____________________________________________________________________ http://www.charm.net/~web/Vlib.html The WWW Virtual Library section on WWW Development ranges from how to develop WWW pages, to setting up servers, to the evolution of the WWW. _____________________________________________________________________ http://guinan.gsfc.nasa.gov/Alan/Richmond.html WWW Systems Engineer
Received on Thursday, 29 September 1994 13:02:05 UTC