Re: Web sites as resources from Robert Hahn on 2004-01-15 (www-tag@w3.org from January 2004)

From: Robert Hahn <rhahn@quarry.com>
Date: Thu, 15 Jan 2004 09:53:25 -0500
To: www-tag@w3.org
Cc: Tim Bray <tbray@textuality.com>
Message-Id: <922BF47C-476A-11D8-A9A0-000A9577390A@quarry.com>
Greetings,

Further to Tim's post, I'd like to take a few moments to outline some  
ideas that I have developed since reading Tim's October "ongoing"  
post[1] on the topic.

First, though, since this is my first post to the list, I feel  
obligated to introduce myself briefly. My name is Robert Hahn, and I've  
been working as a web developer since 1995.  Unlike, I think, many  
people, I jumped in with a Fine Arts (University of Waterloo)  
background, and over time, my interests shifted from client-side  
presentation (which I still do a lot of) to server-side development.  
For more information, see my site[2].

Over the course of November and part of December, I made an attempt to  
methodically break down the requirements that Tim set out in his  
"There’s Still No Such Thing as a Web Site" post and try to devise a  
syntax that hit all of his requirements. I also wanted to see if I  
could re-use an existing syntax, as that would help people by  
dramatically reducing the ramp-up time.  You can see the entire  
development starting at this page[3].

In brief, what I worked out was that it's possible to do a pretty good  
job describing site metadata with XHTML.  I used the <object /> (for  
defining resources) and <div /> (for grouping) tags to do pretty much  
all the heavy lifting, and you can obtain this file by dereferencing a  
link tag I've set up on my web pages, or, since it's  
URI-dereferencable, by the Website: header tag that I saw proposed  
earlier on this list (Note: I haven't implemented the Website:  
namespace version).  If you're interested, the justification for using  
XHTML is here[4].

In order to prove this design, I also took the step of writing a very  
primitive but usable search engine that leverages the metadata found in  
this file.  It's not actually running on my site due to the constraints  
of the web hosting account I'm using, but the code is available for  
download should you want to take it for a spin.

Recently, Tim Bray published a more recent, more formal article[5] that  
appears to me to be a specification document for possible  
implementations.  For the most part, this document and my proposed  
solution seems to agree quite well.  However, there are interesting  
requirements that I'm not sure I've been able to meet, and if the  
response of the group here seems promising, I'm more than willing to  
make whatever changes needed to make it work.

In the "Defining a Web Site" article, towards the end, Tim writes:

"3. It should contain assertions that groups of resources identified by  
URI prefix are members of the site."

What I found interesting about this requirement was the  
URI-addressability of groups of resources.  Tim:  when you refer to  
identifying groups by URI prefix, do you mean that the groups must be  
accessible by XPath when a program wishes to make use of the data, or  
do you intend that there should be a URI attribute of some kind in a  
grouping tag?

If it's the latter, I don't think such a URI would make sense, because  
as I understand the notion of a group, it doesn't actually have a  
'physical' manifestation.  What I mean by this is that if you visit  
http://example.com/foo/, with the assumption that foo/ is a group, what  
the server does is, using some processing logic, substitutes an  
index.html file in it's place.  By that thinking, it seems better to  
denote which resource within a group is the representation for the  
group.  And that means that you'd want to determine the address using  
XPath.

If it's the former, I'd like to suggest that the solution has already  
been suggested in the 4th requirement, quoted here:

"4. It should contain the identification of per-site metadata, probably  
identified by “Nature” and “Purpose” in the style of RDDL."

Although I haven't done this as yet, I think that I could just blend in  
these (namespace qualified) tags into my proposed format.  What I'd  
further propose is that if required, we develop new 'purposes' to help  
describe the particular and unique characteristics of web sites.   
Examples of such a purpose would be:

rddl:purpose="home page"
rddl:purpose="section-level home page"

I chose these examples for a reason - they help address, I think, the  
third requirement that a group of resources be addressable using XPath  
(at least).  If you were looking for a section homepage, you could  
write something like the following XPath:

//div[@title="foo"]/*[contains(rddl:purpose, "home page")]

to obtain the nodeset containing the URI needed for a representation.  
(Note: I haven't tested this XPath to see if it works)

Thank you for reading and consisidering this proposal.

-rh

~~~
[1] http://tbray.org/ongoing/When/200x/2003/10/15/StillNoWebSite
[2] http://www.tenletters.com/rhahn/
[3]  
http://www.tenletters.com/rhahn/Internet/Web/WSDF/whatIsAWebSite.html
[4]  
http://www.tenletters.com/rhahn/Internet/Web/WSDF/ 
XHTMLjustification.html
[5] http://tbray.org/ongoing/When/200x/2004/01/08/WebSite36

---
Robert Hahn,
http://www.tenletters.com/rhahn
Received on Thursday, 15 January 2004 09:58:14 UTC