- From: Danny Ayers <danny@panlanka.net>
- Date: Fri, 18 May 2001 20:04:35 +0600
- To: "Chris Reickenbacker" <chrisr@sunsetdirect.com>, <www-rdf-interest@w3.org>
- Message-ID: <EBEPLGMHCDOJJJPCFHEFIEGPDHAA.danny@panlanka.net>
I am attempting to load the content and structure rdf files from http://dmoz.org/rdf.html into a SQL Server database - I have read the RDF spec at w3 but have found no solutions that even properly parse the rdf files. There are several Java RDF parsers available (e.g. sirpac), there are links from the W3 RDF pages. There is a snag - the RDF in DMOZ is slightly non-standard, if you look at the top lines of the sample you'll see how. Not much, but enough to choke a parser. There are some unix scripts available for cleaning up. I was trying to do more or less what you're talking about, doing the clean-up first inline then using Megginson's RDF filter [1] (which is essentially a parser). I ran into trouble with Java though - try streaming a file that size through - it broke the banks on my machine, though there wasn't meant to be more than a few k buffered. I haven't had time to look at this recently, but intend to go back to it at first opportunity - if you have success in the meantime, please let me know. I am using the Java programming language and am very familiar with XML - but I have no idea how to get the content / structure files from http://dmoz.org/rdf.html into CSV format so that it can be loaded into a relational database (in this case SQL Server.) If you have code to creat a CSV file from RDF or can give me a link to download an RDF parser that understands the structure / content files on http://dmoz.org/rdf.html then I would be very very appreciative. Thanks for your time. Well if you're ok on Java & XML then you should be able to tweak SAX to do what you're after, with or without RDFFilter. Why do you want it in CSV? Surely it would be easier to use JDBC directly? Good luck ;-) [1] http://www.megginson.com
Received on Friday, 18 May 2001 10:09:20 UTC