Re: N-Triples Parser for Python from Phil Dawes on 2004-10-20 (www-rdf-interest@w3.org from October 2004)

From: Phil Dawes <pdawes@users.sourceforge.net>
Date: Wed, 20 Oct 2004 10:45:34 +0000
To: Chris Purcell <cjp39@cam.ac.uk>
Cc: www-rdf-interest@w3.org
Message-ID: <16758.9646.396431.3448@gargle.gargle.HOWL>

Hi Chris,

Chris Purcell writes:
 > 
 > How are you inputting the triples in the first place? This is where the 
 > MySQL limit bit me, and while I did some poking around to speed things 
 > up, I haven't yet put much time into it.
 > 	[poking around] 
 > http://www.srcf.ucam.org/~cjp39/Current/KritTer:2004-08-02+WebLog
 > 
 > Cheers,
 > Chris
 > 

Depends on what is being input - if it's an insert/update of a small
set of assertions, it just uses sql inserts. If it's a large job
(e.g. a batch import of 1000000's of statements) it writes them to a
file and then uses 'LOAD DATA LOCAL INFILE' to bulk import them.

I had a quick look at your weblog post - I assumed from that that you
are bulk importing as well. I attempt to solve the duplicate id
problem by pre-loading the existing ids into memory, along with hashes
of their values. I can then check each literal/uri value asserted
against the hash to see if it exists in the database. N.B. you need to
lock the table to do this, otherwise you can easily get consistency
problems.

Cheers,

Phil

Received on Thursday, 21 October 2004 10:46:58 UTC