W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > April 2003

swebscrape: scraping RDF from the minutes

From: Brian McBride <bwm@hplb.hpl.hp.com>
Date: Mon, 14 Apr 2003 15:31:57 +0100
Message-Id: <5.1.0.14.0.20030414150043.060d7848@localhost>
To: RDF Core <w3c-rdfcore-wg@w3.org>

For longer than I can to admit, I've been manually copying actions from the 
minutes of telecons to *RDF* based actions list manager.  For various 
reasons that I won't go into, I've decided to automate that process, but 
doing so would benefit from some help from the scribes.

At

   http://www.w3.org/2001/sw/RDFCore/scripts/minutes2n3.py

is a short script that will extract RDF based data from the minutes 
provided they conform to some simple rules described below.

I have a servlet, swebscrape, currently behind our firewall, but I'll see 
if I can do something about that, given some input (a post or a URL), looks 
for a line of the form:

swebscrape:N3:python: http://www.w3.org/2001/sw/RDFCore/scripts/minutes2n3.py

I'd appreciate folks including this line at the bottom of the minutes.

If it finds one it retrieves the script given by the URL, runs it on the 
input data to extract some RDF.  The 'N3' indicates the output format and 
the 'python' indicates the script language.  There are some mechanisms to 
constrain this apparently dangerous running of arbitrary code (hint: that's 
why the script is in python)

I'm interested if anyone knows of anything similar to this - e.g. tags in 
html head elements to point to, say xslt scripts to extract the data.

Do let me know if I'm off my trolley again.

Brian

Format for actions in the minutes

o ACTIONS are introduced by lines of the form:

ACTION 20030411#1 bwm a description that fits on this line only

o 'ACTION ' must be at the beginning of a line.  'ACTION: ' also 
works.  This introduces a new action.

o there must be a line of the form

date: 20030411

or whatever the right date is before any new ACTIONs

o ACTION's with an ID that doesn't match the date string are ignored, at 
present

o Action ID's are optional - the script will generate one if its missing

o Scribes, please be careful if you do an action summary not to create a 
duplicate of each action.

o Only one line is processed for each action - so please keep the action 
description to one line
Received on Monday, 14 April 2003 10:32:18 EDT

This archive was generated by hypermail pre-2.1.9 : Wednesday, 3 September 2003 09:56:56 EDT