- From: <connolly@w3.org>
- Date: Thu, 02 Jan 2003 22:39:07 -0600
- To: www-rdf-interest@w3.org
- cc: connolly@w3.org, em@w3.org
I started working on my church's web site. Since I don't know what all
is there, I'd like to get a feel for what's there -- a site map, say.
Surely somebody has done this before, but I couldn't
find it, and it was such an obvious hack that I just
wrote it:
http://www.w3.org/2000/10/swap/util/sitemap.py
sitemap.py,v 1.3 2003/01/03 04:18:32
It's 185 lines, including comments and debug-print-statements.
(that's in addition to the python standard urllib stuff,
DV's HTML parser and xpath implementation,
and the swap RDF store and serializer)
It took just a few hours to develop. Fun stuff!
You invoke it ala...
python sitemap.py http://www.fellowshipofgrace.org/ 100 >sitemap.rdf
(you need the swap stuff in your PYTHONPATH)
and it crawls the site (up to 100 pages) and records
the titles of the pages (using dc:title) and
the links (using dc:relation). For example:
<rdf:Description rdf:about="http://www.fellowshipofgrace.org/about_us.html">
<dc:relation rdf:resource="http://www.efca.org"/>
<dc:relation rdf:resource="http://www.fellowshipofgrace.org/about_us.html"/>
<dc:relation rdf:resource="http://www.fellowshipofgrace.org/contact.html"/>
<dc:relation rdf:resource="http://www.fellowshipofgrace.org/god_s_plan.html"/>
<dc:relation rdf:resource="http://www.fellowshipofgrace.org/index.html"/>
<dc:relation rdf:resource="http://www.fellowshipofgrace.org/jan1.html"/>
<dc:relation rdf:resource="http://www.fellowshipofgrace.org/ministries.html"/>
<dc:relation rdf:resource="http://www.fellowshipofgrace.org/pastors.html"/>
<dc:relation rdf:resource="http://www.fellowshipofgrace.org/statement.html"/>
<dc:title>About Us</dc:title>
<dc:type>text/html</dc:type>
<label>about_us</label>
</rdf:Description>
That's an excerpt from
http://www.fellowshipofgrace.org/2003/maint/sitemap.rdf
Then I used the circles and arrows tools
http://www.w3.org/2001/02pd/
specifically, these rules
http://www.w3.org/2001/02pd/sitemap-style.n3
to produce a diagram
http://www.fellowshipofgrace.org/2003/maint/sitemapFig.svg
http://www.fellowshipofgrace.org/2003/maint/sitemapFig.ps
Even using short labels, the diagram is busier than I had
hoped/expected, but that's just because there are, in fact, a lot of
links. This is a pretty small web site; we'd clearly need better
visualization tools for anything larger.
Bonus points to anybody who can make a nicer picture
from the sitemap.rdf file.
p.s. I'm using a mailer I don't usually use, so
apologies for wierd From: headers and such.
Also note that I'm not subscribed to www-rdf-interest,
so please copy me on replies.
--
Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Thursday, 2 January 2003 23:39:29 UTC