Web site definitions
From: Johan Hjelm (hjelm@w3.org)
Date: Fri, Feb 12 1999
Message-Id: <4.1.19990212144259.00b8cee0@127.0.0.1>
Date: Fri, 12 Feb 1999 15:05:29 +0100
To: www-wca@w3.org
From: Johan Hjelm <hjelm@w3.org>
Cc: lavoie@oclc.org
Subject: Web site definitions
Hi Brian,
I finally got around to reading your paper (we should link it from the WCA
page, and of course it is in the repository already?), and the Alexa
"content area" definition strikes me as a very reasonable concept,
especially if you look at it from a user experience standpoint. One
problem, which you rightly point out, is the problem of actually
identifying the "web publication". I can, for instance, think of several
possible examples of your point 6 (content at the same location
representing two separate content areas), e.g. a small publishing house
which has web editions of its physical magazines in separate subtrees, but
under the same IP-address and logical address.
It may be that we are looking to high up in the stack here. A lot of the
information about aliases, multiple IP-adresses etc. could be gleaned from
the DNS servers, before we start looking at the content.
The question in my mind is: To what degree is it really possible to
automate the identification of content areas? For instance, are there
clusters where the content is interlinked? How are those related to content
areas? (and on from there, how does this relate to metadata
characterisation and the use of Dublin Core and RDF?)
In case it is totally impossible to map content areas identified
automatically onto content areas identified manually, we might want to
entlist the assistance of some librarians to do this. In the Swedish Swesök
project, they have actually talked about having a number of librarians go
through and characterise sites. Even having a sample and outlining the
problems would be helpful.
An interesting thing would be if this could be used to identify social
interaction patterns on the web. For instance, corporate relationships
might be visible when there are relations that cross between sites.
Johan
************************************************************
Johan HJELM
Ericsson RCUR T/K & Cyberlab NY
Currently visiting engineer at the W3C
The World Wide Web Consortium
hjelm@w3.org
http://www.w3.org/People/W3Cpeople.html#Hjelm
Fax +1-617-258 5999, Phone +1-617-263-9630
MIT/LCS, 545 Tech. Sq. Cambridge MA 02139 USA
opinions are personal, always my own,
and not necessarily those of Ericsson or the W3C.
============================================================