[Prev][Next][Index][Thread]

Re: News...



> However, I was thinking of setting up an automated tool,
> that could provide an idea of the quality of a document.
> 
> I intend to base this tool on many features:
> 
> - the number of links in the document
> - the proportion of text per link
> - the number of different icons used in it
> - does it provides several types of classification ?
> - how many times was it accessed last month ?
> - the usage of HTML tags
> ..
Arthur,
	I think the idea of quality indication is excellent!
	But I have strong doubts about the feasibility of automating its
	determination; the `quality' of a WWW document is dependent upon so
	many factors, in such complex ways (which I believe haven't yet been
	studied?), that your list above can only be scratching the surface..

	I believe we should either ask you to not do this, because inaccurate 
	quality indicators would be detrimental - or we should help you to 
	identify those factors and their relationships to each other, and 
	how they can be mapped to a single dimension in any useful way. Let's
	explore it a little.. here's my list:

  1)	The quality of each link. A document that points to other documents
	that are of low quality is poorer than one which only refers to other
	documents of high quality. Thus your tool has to recursively traverse
	the WWW, unless those other documents have quality indicators that
	can be trusted.

  2)	Indicators of quality. It will help the user select the most useful
	documents to explore, if they are somehow marked with quality
	indicators - exactly as you are proposing! But annotation could 
	also be useful, especially if a search facility is also provided.

  3)	Quantity. The more the merrier.. until the user is overwhelmed.
	This leads into..

  4)	Structure. 525 links on one page would be hard for the user to navigate
	unless there is a TOC at the top; but the size of the document would
	cause slow loading for some people - some splitting into separate files
	probably in a hierarchy might be better. But this structure itself is
	subject to quality considerations - it can be very hard to identify
	the most useful partitioning. For the WWW Development section I chose
	to model after the Usenet newsgroups: Providers, Users, and Misc. This
	works well most of the time.. but it may not always be evident to a
	user which path to traverse. This can be alleviated by providing a..

  5)	Search facility. This helps the user to find subjects that cross the
	partitions, or aren't categorised by others exactly as you chose.

  6)	Relevance. This may be subsumed in the previous considerations, but
	is probably worth explicit mention. Your computation of quality has
	to take account of the quality of a link, and then reduce it if it's
	not very relevant.

  7)	Aesthetics & Ergonomics. A user will get a lot more out of a well laid
	out page than one which is cluttered and ugly. Small icons may help,
	but if there are very many then the load time is increased.
	--
	Well, those are just a few quick ideas. I think one thing to beware
	of, is any assumption of linearity - e.g. you mention that you will
	count the number of links, icons, etc.. but I would expect there to
	be some optimum number for each, beyond which you overwhelm the user
	or impact load time, or degrade some other aspect. You have chosen
	an ambitious project!
Alan.
	_____________________________________________________________________
		    http://www.charm.net/~web/Vlib.html
	The WWW Virtual Library section on WWW Development ranges from how to
	develop WWW pages, to setting up servers, to the evolution of the WWW.
	_____________________________________________________________________
	http://guinan.gsfc.nasa.gov/Alan/Richmond.html   WWW Systems Engineer

References: