- From: Javier D. Fernández <jfergar83@gmail.com>
- Date: Mon, 18 Aug 2014 21:44:59 +0200
- To: Michel Dumontier <michel.dumontier@gmail.com>
- Cc: Ruben Verborgh <ruben.verborgh@ugent.be>, Andy Seaborne <andy@seaborne.org>, SWIG Web <semantic-web@w3.org>
Hi all, I'm Javier, from the HDT team. From our own experience, there is an increasing interest in efficient, binary RDF management. Dataset backups, transfers between servers or processing nodes, RDF streaming or self-contained triplestores, are just few examples of real use cases for which we receive feedback requests. Certainly, these scenarios have very different requirements, and the selection of the RDF serialization has to take into account some parameters such as the serialization compactness, the processing speed and the retrieval operations, to name but a few of the most important ones for these cases. In this sense, I really like to see more works on binary RDF such as the RDF Apache Thift proposal, putting the focus on the simplicity and the write/parse speed. I totally agree with Ruben regarding the differences with HDT (BTW thanks for all the references): HDT addresses the complementary problem of providing a highly compressed, indexed binary format, serving fast retrieval operations. Acknowledging that people is not massively publishing HDT files, it is also true that it is gaining its place as a self-contained compressed repository in the way Ruben does, able to solve queries efficiently and with a reduced memory footprint. In addition to our C++ and Java libraries managing HDT, one can deploy HDT files within Jena/Jena Fuseki (http://www.rdfhdt.org/manual-of-hdt-integration-with-jena/), making use of all their well-known features. Besides the very interesting LD fragments proposal, personally, I also envision a great potential of HDT as a self-contained engine to retrieve RDF information in mobile devices. We will present a demo at ISWC'14 in this respect (http://dataweb.infor.uva.es/wp-content/uploads/2014/08/iswc14.pdf). Finally, I would like to point to the RDF Stream Processing Community Group (RSP), in which we have started to look at efficient RDF serializations, including the binary ones (https://www.w3.org/community/rsp/wiki/RSP_Serialization_Group). Any feedback is also welcome! All the best, Javier D. Fernández Postdoc at Sapienza - Università di Roma On Mon, Aug 18, 2014 at 7:06 PM, Michel Dumontier <michel.dumontier@gmail.com> wrote: > On Mon, Aug 18, 2014 at 2:16 AM, Ruben Verborgh <ruben.verborgh@ugent.be> wrote: >> Hi Andy, >> >>> How much is HDT used for real? >> >> We use it to enable client-side SPARQL query execution with 99.9% availability. >> Here is an online demo: http://client.linkeddatafragments.org/. >> >> The HDT files are used to run the server at http://data.linkeddatafragments.org/. >> Details on why HDT is a good format for this are here [1]. >> >>> By whom? >> >> We (Ghent University – iMinds) use it to host high-availability queryable datasets. >> The software that enables this is available as open source [2], >> so anybody else can use it to do the same. >> >>> I couldn't find HDT files. >> >> For the same reason you won't find Virtuoso db files: we use it on the server. > actually, you can! The Bio2RDF project makes their indexed Virtuoso > dbs available. > > http://download.bio2rdf.org/release/3/ > > we also provide gzipped nquads, and we'd be interested in providing an > alternative binary, indexed format. > > m. > >> As you said, Thrift and HDT have different design goals. >> Thrift files are meant to be “found“, HDT files not necessarily. >> >> BTW you can find HDT files here: http://www.rdfhdt.org/datasets/ >> And the tools to make them yourself: http://www.rdfhdt.org/download/ >> >> Ruben >> >> PS I might be interested to look at a JavaScript/Node.js implementation of Thrift. >> Are there any plans (or code) in that direction already? Pointers to start? >> >> [1] http://linkeddatafragments.org/publications/iswc2014.pdf >> [2] https://github.com/LinkedDataFragments/ > -- Javier D. Fernández García jfergar83(at)gmail.com
Received on Tuesday, 19 August 2014 08:31:29 UTC