W3C home > Mailing lists > Public > public-cwm-talk@w3.org > July to September 2010

Parser bug found ib cwm rdf/xml parser

From: Tim Berners-Lee <timbl@w3.org>
Date: Thu, 22 Jul 2010 16:14:11 -0400
Message-Id: <016D0491-6E17-4B8A-871C-0F21224BDBAA@w3.org>
Cc: Kanghao Lu <kennylu@MIT.EDU>, DIG group <diggers@csail.mit.edu>
To: public-cwm-talk talk <public-cwm-talk@w3.org>

It seems that cwm's RDF/XML parser had a bug from the early days when 
a bunch of RDF used strange or no namespaces for the attributes about= etc.

The offending code is referred to in the IRC snippet below.
The file /sax2rdf.py 
had the bug, in which a property attribute  xx:about is assumed to ve rdf:about even though in fact it is sioc:about. It was had been commented as a hack in the code.

and that was what was messing up the reading of the tabulator issues list (data wiki version). 

I found I could edit the file in place and did so.

I have edited the cwm source, and checked it in as it seems to run run quite a lot of the test suite.

Tim BL

from irc://irc.freenode.net/swig

You are now known as timbl.
[22:33]timbl:Found the problem -- it is with cwm's parser
[22:33]timbl:An old kludge
[22:33]timbl:# The following section was a kludge to work with presumably old bad RDF
[22:33]timbl:# files while RDF was being defined way back when.
[22:33]timbl:#            if ns:              # Removed 2010 as this is a kludge which creaks with sioc:about - timbl 2010-07-19
[22:33]timbl:#                if string.find("ID about aboutEachPrefix bagID type", ln)>0:
[22:33]timbl:#                    if ns != RDF_NS_URI:
[22:33]timbl:#                      print ("# Warning -- %s attribute in %s namespace not RDF NS." %
[22:33]timbl:#                              name, ln)
[22:33]timbl:#                      ns = RDF_NS_URI  # Allowed as per dajobe: ID, bagID, about, resource, parseType or type
[22:33]timbl:That whole clause should be commented out
[22:34]timbl:in /afs/csail.mit.edu/group/dig/www/data/TAMI/2007/cwmrete/tmswap/sax2rdf.py
[22:38]timbl:Have commented the lines out in taht fie
[22:39]timbl:on v slow connection so difficult to test
[22:40]timbl:Seems to work better!
[22:40]kennyluck:timbl, that was for proof sent along with the Updated triple.
[22:40]kennyluck:s/Updated triple/updated triples/
[22:40]kennyluck:Sorry if I have messed things up.
[22:41]•timbl heads off for the night
[22:41]timbl:I notice I get a justify pane
[22:41]kennyluck:The system allows you to send SPARUL with proof.
[22:42]kennyluck:It's a dirty hack and a try on CWM + Linked Data.
[22:42]timbl:Tomorrow .. gtg - the proof is stoted in the file an/?
[22:42]kennyluck:Anyway, we should maintain the wiki well. I'm sorry I didn't do it right.
[22:42]kennyluck:in the file.
[22:42]timbl:maybe explain in email to tabulator@
[22:42]kennyluck:I think we shoudn't use Algae anymore though.
[22:42]timbl:Algae doesn't use cwm
[22:43]timbl:Maye we shoul duse SWobjects
[22:43]kennyluck:Yeah, we should have more SPARUL wikis
[22:43]•timbl gtg - tomorrow
[22:49] kennyluck lef
Received on Thursday, 22 July 2010 20:14:27 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:01:06 UTC