- From: Paul Houle <ontology2@gmail.com>
- Date: Thu, 13 Mar 2014 18:17:26 -0400
- To: "semantic-web@w3.org" <semantic-web@w3.org>
For a long time I've heard feedback from people who find it challenging to download large files such as the Freebase data dump and :BaseKB. That's why Gold Snapshots of :BaseKB are now available via BitTorrent http://basekb.com/now/ By simply loading a small torrent file into a program like utorrent or Tranmission, you can efficiently and reliability get a copy of :BaseKB without the risk of data corruption. We plan to release Gold Snapshots on a quarterly basis and to indefinitely retain the data files. This is a big plus for academics, who can use Gold Snapshots for research and expect that others will be able to reproduce their results. It's also great for people who don't want to deal with the hassle of getting an AWS key or dealing with weekly updates. Since most Torrent clients provide the option to select which files you download, you can further speed things up by selecting only the files you need for a particular project. If you need access to Freebase data in the past week, you can still access this data on a requester-paid basis in AWS, and, better yet, access data in S3 directly with Amazon Elastic Map Reduce for parallel processing. :BaseKB data is compatible with industry standard triple stores; recently I found it wasn't only possible, but it was easy to load :BaseKB into Virtuoso 7.1 https://groups.google.com/forum/#!topic/infovore-basekb/m7FL5nqVDbI :BaseKB contains all relevant data from the Freebase RDF dump, but subtracts large amounts of repetitive and irrelevant information, corrects problems with literal formats, and subdivides the dump into portions such that a working database can be easily half the size of the complete Freebase RDF dump. Put this together with advances in triple store performance from the past two years, and now anybody who wants to work with Freebase data in a triple store can do so. -- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254 paul.houle on Skype ontology2@gmail.com
Received on Thursday, 13 March 2014 22:17:53 UTC