- From: Dave Beckett <dave.beckett@bristol.ac.uk>
- Date: Tue, 27 Mar 2001 19:43:22 +0100
- To: Danny Ayers <danny@panlanka.net>
- cc: www-rdf-interest <www-rdf-interest@w3.org>
>>>Danny Ayers said: > <- However, the resulting files generally aren't usually legal Unicode > <- or thus legal XML, so probably your XML/RDF parser will crash and > <- burn afterwards on the output anyway if it doesn't get blown away by > <- memory leaks/growth. > > This really seems like a productive area ;-) Well, some parsers such as the C ones: Jason Diamonds' Repat and my Rapier, can handle all the data up to the illegal character sequences and are only limited by I/O speed, not memory. The java ones all tend to leak until they collapse, unless your machine has oodles of memory. > > <- Small enough to enclose below (also deletes Adult area for less > <- embarassing demos!) > > Damn fine idea. I don't speak Perl, what's going on with the 3 values? As it says: > <- # 0 - before first Adult topic > <- # 1 - during Adult topics > <- # 2 - afterwards So it has three states in processing. not really too important... but best not to create an RDF pr0n database. Incidently, Alberto pointed out the original sed code I converted http://www-diglib.stanford.edu/diglib/ginf/download/dmoz/ but that runs 10-20x slower than the perl Dave
Received on Tuesday, 27 March 2001 13:43:35 UTC