W3C home > Mailing lists > Public > www-rdf-interest@w3.org > October 2004

Re: N-Triples Parser for Python

From: Phil Dawes <pdawes@users.sourceforge.net>
Date: Wed, 20 Oct 2004 10:45:34 +0000
Message-ID: <16758.9646.396431.3448@gargle.gargle.HOWL>
To: Chris Purcell <cjp39@cam.ac.uk>
Cc: www-rdf-interest@w3.org

Hi Chris,

Chris Purcell writes:
 > 
 > How are you inputting the triples in the first place? This is where the 
 > MySQL limit bit me, and while I did some poking around to speed things 
 > up, I haven't yet put much time into it.
 > 	[poking around] 
 > http://www.srcf.ucam.org/~cjp39/Current/KritTer:2004-08-02+WebLog
 > 
 > Cheers,
 > Chris
 > 

Depends on what is being input - if it's an insert/update of a small
set of assertions, it just uses sql inserts. If it's a large job
(e.g. a batch import of 1000000's of statements) it writes them to a
file and then uses 'LOAD DATA LOCAL INFILE' to bulk import them.

I had a quick look at your weblog post - I assumed from that that you
are bulk importing as well. I attempt to solve the duplicate id
problem by pre-loading the existing ids into memory, along with hashes
of their values. I can then check each literal/uri value asserted
against the hash to see if it exists in the database. N.B. you need to
lock the table to do this, otherwise you can easily get consistency
problems.

Cheers,

Phil
Received on Thursday, 21 October 2004 10:46:58 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:52:10 GMT