W3C home > Mailing lists > Public > www-rdf-interest@w3.org > May 2001

RE: RDF question

From: Danny Ayers <danny@panlanka.net>
Date: Fri, 18 May 2001 20:04:35 +0600
To: "Chris Reickenbacker" <chrisr@sunsetdirect.com>, <www-rdf-interest@w3.org>

  I am attempting to load the content and structure rdf files from
http://dmoz.org/rdf.html into a SQL Server database - I have read the RDF
spec at w3 but have found no solutions that even properly parse the rdf

  There are several Java RDF parsers available (e.g. sirpac), there are
links from the W3 RDF pages. There is a snag - the RDF in DMOZ is slightly
non-standard, if you look at the top lines of the sample you'll see how. Not
much, but enough to choke a parser. There are some unix scripts available
for cleaning up. I was trying to do more or less what you're talking about,
doing the clean-up first inline then using Megginson's RDF filter [1] (which
is essentially a parser). I ran into trouble with Java though - try
streaming a file that size through - it broke the banks on my machine,
though there wasn't meant to be more than a few k buffered.

  I haven't had time to look at this recently, but intend to go back to it
at first opportunity - if you have success in the meantime, please let me

    I am using the Java programming language and am very familiar with XML -
but I have no idea how to get the content / structure files from
http://dmoz.org/rdf.html into CSV format so that it can be loaded into a
relational database (in this case SQL Server.) If you have code to creat a
CSV file from RDF or can give me a link to download an RDF parser that
understands the structure / content files on http://dmoz.org/rdf.html then I
would be very very appreciative. Thanks for your time.

  Well if you're ok on Java & XML then you should be able to tweak SAX to do
what you're after, with or without RDFFilter. Why do you want it in CSV?
Surely it would be easier to use JDBC directly?

  Good luck ;-)

  [1] http://www.megginson.com
Received on Friday, 18 May 2001 10:09:20 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:44:30 UTC