- From: Alberto Reggiori <alberto.reggiori@jrc.it>
- Date: Tue, 27 Mar 2001 19:52:27 +0200
- To: Danny Ayers <danny@panlanka.net>
- CC: RDF-Interest <www-rdf-interest@w3.org>
Hello Danny > I've put together a little utility for making the (unzipped) DMOZ dumps > readable by David Megginson's RDFFilter. Unfortunately, this turned out to > be a bit more problematic than I thought. Using buffered streams, I would > have thought this would be straightforward, but no, usually it crashes out > due to lack of memory not long after 1M lines. I did have one run that went > to completion (about 5M lines - the dump I'm playing with is perhaps 9 > months old). Of course I tried again and wiped my result. I've played with > various parameters - tried it on Win2k & Linux, pretty much the same > behaviour. Looks like there's some fundamental aspect of Java I wasn't aware > of... I am not sure Java is the right tool for this, what about using UNIX sed like commands? :-) http://www-diglib.stanford.edu/diglib/ginf/download/dmoz/ regards Alberto
Received on Tuesday, 27 March 2001 12:49:37 UTC