[Prev][Next][Index][Thread]

News...



Dear maintainers,

Some of you will receive mail from the www-vlib for the first time.
Welcome! This is the list dedicated to the administration of the 
World-Wide Web Virtual Library.

some news...

1/ the stars

At the current stage, they are not based on anything serious.
I overlooked the VL quickly, and my aim was also to embellish it.
However, I was thinking of setting up an automated tool,
that could provide an idea of the quality of a document.

I intend to base this tool on many features:

- the number of links in the document
- the proportion of text per link
- the number of different icons used in it
- does it provides several types of classification ?
- how many times was it accessed last month ?
- the usage of HTML tags
...

When the document is split into several parts:
On the top document .../Overview.html, when the parts are not 
refered to individually, they will be considered as part of the document

On the summary .../Overview2.html, the document will be analyzed by itself

I'm still thinking about this, so please give me your opinion on it

If your service doesn't have as many stars as you feel it should,
forgive me, this is only a temporary situation :)


2/ The classification

Thanks to your comments,
I've looked at several types of classification, such as Dewey,
Library of Congress, Universal Decimal Classification...

The main problem with these well-known classifications is that they
are a century old, and occidental.

+ Occidental

take the Dewey, section 2: Religion
 Sections 21 to 28 are dedicated to Christianity
 Section 29 is for other religions

Not to speak of the Geography of the United States compared to the one 
of Rwanda :)

+ A century old

Computers didn't exist, so they are relegated to some obscure part.
As for recent technologies...

+ Based on letters and numbers

Their goal is to provide a short sequence of figures (and letters
for the Library of Congress), so that people can find their book quickly
by saying: I want books on 791.430.944 (Dewey), or JK 9661-9993 (LC)

To me, this looks a little bit like the configuration file of sendmail.

But we could (should ?) take bits here and there.

I found two rules that will guide me on this:
- Look at what people access most, and provide a short way for 
these documents
- Classification should occur when there is enough content to justify it
(so I don't think we should start apply a whole calssification with
billions of sub-parts, most of them empty :)

I have discussed with several librarians in Geneva on this, and
will meet tomorrow UN experts.


3/ The Form

I've written a sample form at
http://info.cern.ch/hypertext/DataSources/bySubject/NewReg.html

It will send mail messages directly to you.

I'm also testing a way to analyze these mail messages, and add
automatically the new entries in some part of the document,
between a <!-- begin_adds --> and a <!-- end_adds -->.

So, at least in my documents, there will be a part like

<H3>Automatic Registration</H3>
Below, you will find new entries that are not yet included
...

You will be welcome to copy my script once it is ready.

The special place given to commercial entries comes from the fact that
they will be added automatically on my document without moderation from
me (I will only proceed to post-moderation). I'll do this because
I'm sure plenty of commercial companies want to get in, and I 
don't want to have to add them one by one.

Once we all agree on the form, I would like this form to become the preferred
way of adding new entries, because it collects keywords given by the authors
that will allow us to build an efficient index. 

4/ info.cern.ch

We have two new computers that will help info.cern.ch support its load.

I have good hopes that in the next days info.cern.ch will suddenly become
much much faster, as we also switch from NFS to AFS!


Any comment appreciated, either on this list or directly to me


Regards,

Arthur

Follow-Ups: