W3C home > Mailing lists > Public > public-cwm-talk@w3.org > January to March 2007

rdf:datatype problem when merging files using cwm

From: Harry Halpin <hhalpin@ibiblio.org>
Date: Tue, 06 Mar 2007 23:05:14 -0500
Message-ID: <45EE39FA.9000706@ibiblio.org>
To: public-cwm-talk@w3.org

Not sure which error this is on. I have a bunch of autogenerated RDF
files I got from scraping various patent databases that I'm trying to
merge into a large RDF file.

This is the error:

 @@@@@@ toXML.py 382:  (0, u'http://www.w3.org/2001/XMLSchema#string'

This produces weird output:
..
 <pat:dateFiled rdf:datatype="(0,
u&#39;http://www.w3.org/2001/XMLSchema#dateTime&#39;)">1990-06-29</pat:dateFiled>
...
Ouch, I don't want that (0,blahblah) in my output.  I'm assuming somehow
the problem is in my scraper, but those seem like valid URIs to me.

Input:
http://www.ibiblio.org/hhalpin/homepage/notes/5161193.rdf
http://www.ibiblio.org/hhalpin/homepage/notes/4130865.rdf
http://www.ibiblio.org/hhalpin/homepage/notes/4203154.rdf

Output:
http://www.ibiblio.org/hhalpin/homepage/notes/example.rdf

Example Commandline:
cwm --rdf 5161193.rdf 4130865.rdf 4203154.rdf --rdf --pipe > example.rdf

Rapper does not seem to give me an errors on these files, but not sure
how to merge lots of files using rapper.

Also, what's a good command line utility to merge RDF files from bunches
of small RDF/XML files to verally large ones. We have about 6,000 files,
each containing an average of about 70 triples, that we want to make a
big graph out of. I was planning on just putting cwm in my pipeline...



-- 

		-harry

Harry Halpin,  University of Edinburgh 
http://www.ibiblio.org/hhalpin 6B522426
Received on Wednesday, 7 March 2007 04:05:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:11:02 GMT