Mailing list archives and the Semantic Web

a SIOC-mail a potential configuration of element http://element.rubyforge.org/

philosophy is best summed up here: (i love when ppl read my mind, then write it down!)

http://groups.google.com/group/whits/browse_thread/thread/36b4011de5dca63f/678137e3f6085d52


Nondestructive (to original files) Ad-hoc Post-hoc of RDF is the name (aka nap)

its designed so you can rm -rf whatever you want whenever you want without choking. id recommend not moving the mail originals, and use hardlinks or symlinks 


::config::

first, remove any procmail rules you have

D=$HOME/m/`date +%Y/%m/%d`
MKDIR=`test -d $D || mkdir -p $D`
DEFAULT=$D


with this (_,p,o) pattern matches are additional subsorted by date,discussed in http://lists.w3.org/Archives/Public/public-semweb-ui/2010Jun/0005.html


sh/rc in git repo contiains an alias for indexing, cd to server-root
(generates some paths in the filesystem triplestore, and fulltext index)

..assuming your mailserver isnt local, youll want to fetch it too:

cd ~/
getmail
e.mail

throw that in cron or whatever suits you (personally i getmail once a day, when i've find some wifi)

another good CLI mailer compatible with this path structure is notmuch http://notmuchmail.org/

:: access

you inquired particularly about list archiving, which implies a static server, as a gmail alternative (access anywhere?)

its just a filesystem, numerous ways to make this accessible anywhere
 AFS , NFS , Dropbox, (rsync | scp) * cron. 

debian on nokia i use for my mailarchive. it works well on N900. other tablets/phones on OMAP can run it too. busybox on android can probably be supplemented with procmail/ruby also


UI is HTTP/HTML so that already runs anywhere. 

::model
triples are generated via a tripleStream Source in Ruby using TMail to parse messages

to get at this source, pass a block to :mail:

irb(main):006:0> E('/m/2010/12/10').c[4].mail &::E::Show

s String AANLkTi=8CzehDt9M+KBL1Mh7MeVrbObNo6ywP4cu4HMy@mail.gmail.com
p String http://www.w3.org/1999/02/22-rdf-syntax-ns#type
o E http://rdfs.org/sioc/types#MailMessage

s String AANLkTi=8CzehDt9M+KBL1Mh7MeVrbObNo6ywP4cu4HMy@mail.gmail.com
p String http://purl.org/dc/terms/date
o String 2010-12-10T00:41:20+00:00


s String AANLkTi=8CzehDt9M+KBL1Mh7MeVrbObNo6ywP4cu4HMy@mail.gmail.com
p String http://purl.org/dc/terms/title
o String Re: [LAU] [LAA] Mixxx 1.9.0 beta1 and Mixxx 1.8.2 released

s String AANLkTi=8CzehDt9M+KBL1Mh7MeVrbObNo6ywP4cu4HMy@mail.gmail.com
p String http://rdfs.org/sioc/ns#addressed_to
o E gabrbedd@gmail.com

you could insert into a SPARQL-database using some library from here. method 'uri' is available on E instances in object position

complex processing of tripleStreams can be achieved in Ruby without coroutine/stream addons:
http://lists.w3.org/Archives/Public/public-rdf-ruby/2010Sep/0001.html

::views

so basically, i paginate thru particular p/o patterns, and check out new stuff

 /mail

this path always forwards to todays messages

 curl -I http://m/mail
HTTP/1.1 303 See Other
Location: /m/2010/12/11/*?


yep, server supports globs, what can i say im a shell user at heart?

curl -H "Accept: text/rdf+n3" http://m/m/2010/12/11/*

</m/2010/12/11/msg.7PLT> <fs:size> "15844"^^<http://www.w3.org/2001/XMLSchema#integer> .
</m/2010/12/11/msg.7PLT> <fs:ftype> "file" .
</m/2010/12/11/msg.7PLT> <fs:mtime> "2010-12-11 02:31:16 +0000" .

i saw presbrey asking on IRC about a schema for fs meta-data. so prefix unexpanded atm. will check up on his datawiki to see if he's come up with one

message and thread view are main two, many ways exist to populate them

globbing, pattern-match, thread-reconstruct  and search

 , is a special qs keyword, meaning 'use URI root with value for p,o pattern'

http://i574.photobucket.com/albums/ss187/ix9/hyper/2010-12-11-155539_1366x768_scrot.png 

 as shown opening Martin's messages in a window (everything is bookmarkable)


 q is a special qs keyword, for a custom 'query'. eg 'thread' is this lambda

 ->{d.walk SIOC+'reply_of',m}  (populates request-time model with a recursive walk of SIOC-ized 'references' and 'reply-to') 

anything page-able generates next/prev resources containing links to more, arrows seen on various screens

du view is useful to find messages of large-attachment disposition, just tap on them:

http://i574.photobucket.com/albums/ss187/ix9/hyper/2010-12-11-153930_1366x768_scrot.png

images view looks exhaustively thru subject,predicat,eobject URIS and even inside sioc:content fields using a beautful-soup markup parser for img tags and a[href]s pointing to images

http://blog.whats-your.name/public/cas.png

also shown is a 'p' view which generates a list of RDFa attributes found in the document to disable/enable particular field contents


im using Groonga a keyword search as its a mmapped file doesnt require a daemon running, is fairly efficient on my phone
http://blog.whats-your.name/post/2010/10/09/Groonga


all UI works for Feeds too as theyre using same SIOC see http://blog.whats-your.name/post/2010/10/25/aint-no-winer

Received on Saturday, 11 December 2010 18:57:04 UTC