- From: Dan Connolly <connolly@w3.org>
- Date: Fri, 14 Jul 2000 16:21:53 -0500
- To: www-rdf-interest@w3.org
There are very few data formats I trust... when I use the computer to capture my knowledge, I pretty much stick to plain text, XML (esp XHTML, or at least HTML that tidy can turn into XHTML for me), RCS/CVS, and RFC822/MIME.' I use JPG, PNG, and PDF if I must, but not for capturing knowledge for exchange, revision, etc. I'm having pretty good luck extracting RDF from XML/XHTML stuff using XSLT, e.g. http://www.w3.org/People/Connolly/smart-home.xsl http://www.w3.org/People/Connolly/home-smart.rdf http://www.w3.org/People/Connolly/events/events-smart.rdf But I still mostly use messy perl/grep stuff for dealing with my email, because email is so messy to parse. All the perl and python libraries I've seen for email sort of work, except for a few hundred wierdly formatted messages in my archive. Then I found this fantastic resource: Internet mail message header format by D. J. Bernstein http://cr.yp.to/immhf.html that has a wealth of knowledge about how to parse email. (See also: anything written by jwz, esp the comments in the grendle source code, btw http://www.mozilla.org/projects/grendel/). That, and a particular query I wanted to run over my whole email archive, inspired me to write a little perl script to extract RDF from my email -- at least a little mid/date/from/to/subject log I keep of my incoming mail: http://www.w3.org/2000/04/maillog2rdf/log2rdf.pl $Id: log2rdf.pl,v 1.2 2000/07/14 20:28:21 connolly Exp $ Of course, to encode stuff in RDF, I had to make up a schema: Email Fields, an RDF Schema http://www.w3.org/2000/04/maillog2rdf/email# $Revision: 1.2 $ of $Date: 2000/07/14 20:29:32 $ I'm still wrestling with a few things, especially the case of Message-Id: 23@example.org To: Fred <fred@example.org>, Bob <bob@example.com> Should that be mid:23@example.org -- to --> mailto:fred@example.org --called--> "Fred" -- to --> mailto:bob@example.com --called--> "Bob" i.e. is the mailbox called Fred? I wouldn't think so, and RFC822 agrees: "The name reference is optional and is usually used to indicate the human name of a recipient." That suggests: mid:23@example.org -- to --> [recip1] --phrase-->"Fred" --addr-spec-->mailto:fred@example.org -- to --> [recip1] --phrase-->"Bob" --addr-spec-->mailto:bob@example.com And that doesn't capture that there were no other (stated) recipients. For that, I should model it ala: mid:23@example.org -- to --> [bag1] --first-->[recip1] (with phrase/addr-spec as above) --rest-->[bag2] --first-->[recip2] (as above) --rest-->empty (I use first/rest/empty rather than _1 _2 to model lists. See http://www.w3.org/2000/07/12-lists# ) That's sort of a mouthful... but I suppose I can use convenience rules/properties ala toAddr(?msg, ?addr) :- to(?msg, ?recips), includes(?recips, ?recip), addr-spec(?recip, ?addr). includes(?lst, ?item) :- first(?lst, ?item). includes(?lst, ?item) :- rest(?lst, ?lst2), includes(?lst2, ?item). I wish I had an RDF model for rules that I was happy with. -- Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Friday, 14 July 2000 17:22:25 UTC