W3C home > Mailing lists > Public > www-rdf-interest@w3.org > March 2001

Re: Java DMOZ cleaner

From: Alberto Reggiori <alberto.reggiori@jrc.it>
Date: Tue, 27 Mar 2001 19:52:27 +0200
Message-ID: <3AC0D35B.483C2E4A@jrc.it>
To: Danny Ayers <danny@panlanka.net>
CC: RDF-Interest <www-rdf-interest@w3.org>
Hello Danny

> I've put together a little utility for making the (unzipped) DMOZ dumps
> readable by David Megginson's RDFFilter. Unfortunately, this turned out to
> be a bit more problematic than I thought. Using buffered streams, I would
> have thought this would be straightforward, but no, usually it crashes out
> due to lack of memory not long after 1M lines. I did have one run that went
> to completion (about 5M lines - the dump I'm playing with is perhaps 9
> months old). Of course I tried again and wiped my result. I've played with
> various parameters - tried it on Win2k & Linux, pretty much the same
> behaviour. Looks like there's some fundamental aspect of Java I wasn't aware
> of...

I am not sure Java is the right tool for this, what about using UNIX sed
like commands? :-)

http://www-diglib.stanford.edu/diglib/ginf/download/dmoz/


regards

Alberto
Received on Tuesday, 27 March 2001 12:49:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:51:48 GMT