> However, I was thinking of setting up an automated tool,
> that could provide an idea of the quality of a document.
> I intend to base this tool on many features:
> - the number of links in the document
> - the proportion of text per link
> - the number of different icons used in it
> - does it provides several types of classification ?
> - how many times was it accessed last month ?
> - the usage of HTML tags
I think the idea of quality indication is excellent!
But I have strong doubts about the feasibility of automating its
determination; the `quality' of a WWW document is dependent upon so
many factors, in such complex ways (which I believe haven't yet been
studied?), that your list above can only be scratching the surface..
I believe we should either ask you to not do this, because inaccurate
quality indicators would be detrimental - or we should help you to
identify those factors and their relationships to each other, and
how they can be mapped to a single dimension in any useful way. Let's
explore it a little.. here's my list:
1) The quality of each link. A document that points to other documents
that are of low quality is poorer than one which only refers to other
documents of high quality. Thus your tool has to recursively traverse
the WWW, unless those other documents have quality indicators that
can be trusted.
2) Indicators of quality. It will help the user select the most useful
documents to explore, if they are somehow marked with quality
indicators - exactly as you are proposing! But annotation could
also be useful, especially if a search facility is also provided.
3) Quantity. The more the merrier.. until the user is overwhelmed.
This leads into..
4) Structure. 525 links on one page would be hard for the user to navigate
unless there is a TOC at the top; but the size of the document would
cause slow loading for some people - some splitting into separate files
probably in a hierarchy might be better. But this structure itself is
subject to quality considerations - it can be very hard to identify
the most useful partitioning. For the WWW Development section I chose
to model after the Usenet newsgroups: Providers, Users, and Misc. This
works well most of the time.. but it may not always be evident to a
user which path to traverse. This can be alleviated by providing a..
5) Search facility. This helps the user to find subjects that cross the
partitions, or aren't categorised by others exactly as you chose.
6) Relevance. This may be subsumed in the previous considerations, but
is probably worth explicit mention. Your computation of quality has
to take account of the quality of a link, and then reduce it if it's
not very relevant.
7) Aesthetics & Ergonomics. A user will get a lot more out of a well laid
out page than one which is cluttered and ugly. Small icons may help,
but if there are very many then the load time is increased.
Well, those are just a few quick ideas. I think one thing to beware
of, is any assumption of linearity - e.g. you mention that you will
count the number of links, icons, etc.. but I would expect there to
be some optimum number for each, beyond which you overwhelm the user
or impact load time, or degrade some other aspect. You have chosen
an ambitious project!
The WWW Virtual Library section on WWW Development ranges from how to
develop WWW pages, to setting up servers, to the evolution of the WWW.
http://guinan.gsfc.nasa.gov/Alan/Richmond.html WWW Systems Engineer
- From: firstname.lastname@example.org (Arthur Secret)