- From: Hugh Glaser <hg@ecs.soton.ac.uk>
- Date: Thu, 9 Jul 2009 11:08:00 +0100
- To: Peter Ansell <ansell.peter@gmail.com>, Juan Sequeda <juanfederico@gmail.com>
- CC: Linked Data community <public-lod@w3.org>
On 09/07/2009 07:56, "Peter Ansell" <ansell.peter@gmail.com> wrote: > 2009/7/9 Juan Sequeda <juanfederico@gmail.com>: >> On Jul 9, 2009, at 2:25 AM, Hugh Glaser <hg@ecs.soton.ac.uk> wrote: <snip hash URI comments> >>> Mind you, it does mean that you should make sure that you don't put too >>> many >>> LD URIs in one document. >>> If dbpedia decided to represent all the RDF in one document, and then use >>> hash URIs, it would be somewhat problematic. >> >> Could you explain why??? > > Does it seem reasonable to have to trawl through millions (or > billions) of RDF triples resolved from a large database that only used > one base URI with fragment identifiers for everything else if you > don't need to considering that 100 specific RDF triples in a compact > document might have been all you needed to see? > > Peter > > As a concrete example: For dblp we split the data into year models, before asserting into the triplestore, so we can serve RDF for each URI, by sort of DESCRIBing. Paper: http://dblp.rkbexplorer.com/id/journals/expert/ShadboltGGHS04 comes from a model file: http://dblp.rkbexplorer.com/models/dblp-publications-2004.rdf which is 155MB. using hash URIs would require files of that size to be served for every access, although if we were actually doing it that way we would of course change our model file granularity size to avoid it. So there is both possible network and processing overhead, which can be got wrong. In fact large foaf files give you quite a lot of extra stuff, if all you wanted was some personal details. When you want to know about timbl, if you only wanted his blog address you don't necessarily want to download and process 30-odd KB of RDF, much of it details of the people he knows (such as Tom Ilube's URI). Just something to be aware of when serving linked data as hash. And to add something else to the mix. This is another reason semantic sitemaps are so important for search engines like Sindice. Sindice can index our model file, but on receiving a request for a URI in it, without the sitemap, all it could easily be able to do would be point the requester at the 155MB model file. Because of the sitemap, it can much more easily work out for itself what it needs to know about the URI to point the user at the linked data URI - all without spidering our whole triplestore, which would be unacceptable. Ah, the rich tapestry of life that is linked data! Best Hugh
Received on Thursday, 9 July 2009 10:09:00 UTC