W3C home > Mailing lists > Public > public-lod@w3.org > March 2013

Re: "loading multiple .rdf files into a local virtuoso server"

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Tue, 12 Mar 2013 19:39:11 -0400
Message-ID: <513FBC9F.2000908@openlinksw.com>
To: public-lod@w3.org
On 3/12/13 6:33 PM, Barry Norton wrote:
> On 12/03/13 21:49, Kingsley Idehen wrote:
>> On 3/12/13 4:53 PM, Barry Norton wrote:
>>>
>>> Such questions really belong on the Virtuoso list, but don't most 
>>> triplestores support the SPARQL Graph Store Protocol by now?
>>
>> Yes, but when you've got a massive collection of RDF files you still 
>> need to bulk load from a local directory etc..
>
> As below.
>
>>
>>>
>>> Most of my (bash) load scripts look like this:
>>>
>>> for file in *; do curl -H "Content-Type:text/turtle" -T $file 
>>> your-server/your-database/rdf-graphs/service?graph=your-graph; done
>>
>> Yes for small files, no for a massive collection of files or a few 
>> very large files :-)
>
> I presume your objection is (small files) the set-up/shutdown 
> overheard, and (for large files) that these are not necessarily passed 
> in compressed form? Or have you other objections?

I don't actually have any objections. Just indicating that this is an 
option for specific scenarios.

>
> I've always meant to look into 'Content-Encoding: gzip', but I've 
> always been happy to walk over an uncompressed split of 
> NTriples/NQuads due to a combination of: the relative time of transfer 
> relative to canonicalisation and indexing; the desire to split very 
> large files (e.g. Freebase) to localise errors.

Basically, the more options the better. Maybe its time to make a 
best-practices document for data loading scenarios etc..

Kingsley
>
> Barry
>
>
>
>>>
>>> On 12/03/13 20:36, Kalpa Gunaratna wrote:
>>>> actually I tried that (I used the procedure to load DBpedia dump 
>>>> 3.8 which was in gz format as I remember.)
>>>>
>>>> But when I try to load now a dump of DBLP which has .rdf files as 
>>>> the dump when I uncompress it, I do not know how to load the files. 
>>>> Following is what I get running bulk load procedure.
>>>>
>>>> SQL> ld_dir(‘/home/kalpa/Virtuoso/data/datasets/DBLP-RKB/models’, 
>>>> ‘*.*’, ‘http://dblp-rkb.org’);
>>>> Connected to OpenLink Virtuoso
>>>> Driver: 06.01.3127 OpenLink Virtuoso ODBC Driver
>>>>
>>>> *** Error 37000: [Virtuoso Driver][Virtuoso Server]SQ074: Line 1: 
>>>> syntax error at '.' before '*'
>>>> at line 1 of Top-Level:
>>>> ld_dir(‘/home/kalpa/Virtuoso/data/datasets/DBLP-RKB/models’, ‘*.*’, 
>>>> ‘http://dblp-rkb.org’)
>>>>
>>>>
>>>>
>>>> On Tue, Mar 12, 2013 at 8:27 PM, Francisco Cifuentes 
>>>> <francisco.cifuentes@weso.es <mailto:francisco.cifuentes@weso.es>> 
>>>> wrote:
>>>>
>>>>     Take a look here:
>>>>
>>>>     http://www.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtBulkRDFLoader
>>>>
>>>>     Regards
>>>>
>>>>     Francisco.
>>>>
>>>>
>>>>     2013/3/12 Kalpa Gunaratna <kalpagunaratna@gmail.com
>>>>     <mailto:kalpagunaratna@gmail.com>>
>>>>
>>>>         Hi,
>>>>            I have an rdf dump that has data in the form of .rdf
>>>>         files. I want to load them into a local Virtuoso server so
>>>>         that I can query them using the local sparql endpoint. But
>>>>         I see that it is possible to load one RDF/XML file at a
>>>>         time using the command "DB.DBA.RDF_LOAD_RDFXML_MT". Since
>>>>         the dump has many files, executing this command many times
>>>>         is not going to work. What are the other alternatives I
>>>>         have in loading them to the server? Thank you in advance
>>>>         for any help!
>>>>
>>>>         Regards
>>>>         Kalpa Gunaratna
>>>>
>>>>
>>>>
>>>>
>>>>     -- 
>>>>     Francisco Cifuentes-Silva
>>>>     ------------------------------------
>>>>     WESO Research Group
>>>>     Facultad de Ciencias
>>>>     Universidad de Oviedo
>>>>     Tel: +34 985103397
>>>>     http://www.weso.es
>>>>     http://twitter.com/fcifuentes
>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Regards
>>>> Kalpa Gunaratna
>>>
>>
>>
>> -- 
>>
>> Regards,
>>
>> Kingsley Idehen	
>> Founder & CEO
>> OpenLink Software
>> Company Web:http://www.openlinksw.com
>> Personal Weblog:http://www.openlinksw.com/blog/~kidehen
>> Twitter/Identi.ca handle: @kidehen
>> Google+ Profile:https://plus.google.com/112399767740508618350/about
>> LinkedIn Profile:http://www.linkedin.com/in/kidehen
>>
>>
>>
>>
>


-- 

Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen







Received on Tuesday, 12 March 2013 23:39:39 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:16:30 UTC