Re: Bringing fast triples to Node.js with HDT from Ruben Verborgh on 2014-10-01 (public-rdfjs@w3.org from October 2014)

From: Ruben Verborgh <ruben.verborgh@ugent.be>
Date: Wed, 1 Oct 2014 09:48:25 +0200
To: Luca Matteis <lmatteis@gmail.com>
Cc: public-rdfjs@w3.org
Message-Id: <4DE92282-27B1-403E-8368-2E957DF0BAA5@ugent.be>

Hi Luca,

> As you say JavaScript
> typed arrays provide lots of functionality for reading files but
> you'll eventually hit a hard limit when it comes to memory space.
> However I think it would still be an interesting solution because even
> only 30MB in memory could amount to 10 million triples which is a lot
> of data.

I heard that the original HDT authors have plans for a pure JavaScript implementation. It shouldn't be too hard, given the existing code bases in C++ and Java, but it would take some effort. The specification should give the necessary clues.

At the moment, most use cases we see for HDT are at the server, so the Node.js solution is sufficient. But with clients becoming more and more powerful, that might change quickly :-)

> (no indexing required if it's in HDT already)

Just a small addition here: the library will generate an implementation-specific .hdt.index file the first time, to allow faster searches. This is an additional index, which will be reused from the second time onwards. In later versions of the library, I might offer the option to load the file without creating an index, but then operations would be slower.

> This makes me want to write a Pubby version for
> Node.js that works using HDT's triple matching capabilities.

Keep us updated ;-)

Ruben

Received on Wednesday, 1 October 2014 07:48:53 UTC