- From: <connolly@w3.org>
- Date: Thu, 02 Jan 2003 22:39:07 -0600
- To: www-rdf-interest@w3.org
- cc: connolly@w3.org, em@w3.org
I started working on my church's web site. Since I don't know what all is there, I'd like to get a feel for what's there -- a site map, say. Surely somebody has done this before, but I couldn't find it, and it was such an obvious hack that I just wrote it: http://www.w3.org/2000/10/swap/util/sitemap.py sitemap.py,v 1.3 2003/01/03 04:18:32 It's 185 lines, including comments and debug-print-statements. (that's in addition to the python standard urllib stuff, DV's HTML parser and xpath implementation, and the swap RDF store and serializer) It took just a few hours to develop. Fun stuff! You invoke it ala... python sitemap.py http://www.fellowshipofgrace.org/ 100 >sitemap.rdf (you need the swap stuff in your PYTHONPATH) and it crawls the site (up to 100 pages) and records the titles of the pages (using dc:title) and the links (using dc:relation). For example: <rdf:Description rdf:about="http://www.fellowshipofgrace.org/about_us.html"> <dc:relation rdf:resource="http://www.efca.org"/> <dc:relation rdf:resource="http://www.fellowshipofgrace.org/about_us.html"/> <dc:relation rdf:resource="http://www.fellowshipofgrace.org/contact.html"/> <dc:relation rdf:resource="http://www.fellowshipofgrace.org/god_s_plan.html"/> <dc:relation rdf:resource="http://www.fellowshipofgrace.org/index.html"/> <dc:relation rdf:resource="http://www.fellowshipofgrace.org/jan1.html"/> <dc:relation rdf:resource="http://www.fellowshipofgrace.org/ministries.html"/> <dc:relation rdf:resource="http://www.fellowshipofgrace.org/pastors.html"/> <dc:relation rdf:resource="http://www.fellowshipofgrace.org/statement.html"/> <dc:title>About Us</dc:title> <dc:type>text/html</dc:type> <label>about_us</label> </rdf:Description> That's an excerpt from http://www.fellowshipofgrace.org/2003/maint/sitemap.rdf Then I used the circles and arrows tools http://www.w3.org/2001/02pd/ specifically, these rules http://www.w3.org/2001/02pd/sitemap-style.n3 to produce a diagram http://www.fellowshipofgrace.org/2003/maint/sitemapFig.svg http://www.fellowshipofgrace.org/2003/maint/sitemapFig.ps Even using short labels, the diagram is busier than I had hoped/expected, but that's just because there are, in fact, a lot of links. This is a pretty small web site; we'd clearly need better visualization tools for anything larger. Bonus points to anybody who can make a nicer picture from the sitemap.rdf file. p.s. I'm using a mailer I don't usually use, so apologies for wierd From: headers and such. Also note that I'm not subscribed to www-rdf-interest, so please copy me on replies. -- Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Thursday, 2 January 2003 23:39:29 UTC