- From: Danny Ayers <danny.ayers@gmail.com>
- Date: Sun, 23 Oct 2005 22:38:34 +0200
- To: SWIG <semantic-web@w3.org>
- Cc: general@simile.mit.edu, rickard.oberg@senselogic.se
I was wondering if anyone had come up with any strategies that might be useful in a scenario that came up on the SIMILE list [1]. Rickard is using Piggy Bank's scraper to harvest moderately large amounts of data into its store (30,000 items, 10 properties each), and is running into performance issues. I'm not sure, but he mentioned Wikipedia earlier, that may be the datasource. I think it's reasonable to consider a triplestore as merely a cache of a certain chunk of the Semantic Web at large. So in a case like this, maybe it makes more sense to forget trying to cache /everything/, just grabbing things into the working model as required. But say there's a setup like a SPARQL interface to a store, and a scraper (HTTP GET+ whatever translation is appropriate). How might you do figure what's needed to fulfil the query, what joins are required, especially where there isn't any direct subject-object kind of connection to the original data? (i.e. where there's lots of bnodes). Querying Wikipedia as-is via SPARQL is probably a good use case. I can't help thinking there might be something akin to CBDs [2] that might work, but I'm not sure offhand how one would delegate the path-walking down to a scraper. Or maybe someone has an approach to cross-triplestore querying that will work (a SPARQL-squared kind of trick might be useful [3], but I suspect there might not be enough linkage). Thoughts? Incidentally, there is some schadenfreudenesque comfort in knowing that these kind of problems aren't solely SW issues. From the same list thread: [[ > Glad you're pushing it to the limit :-) Just curious, have you tried > plotting 30,000 items on Google Maps?! Yes. Doesn't work :-) ]] Cheers, Danny. [1] http://simile.mit.edu/mail/ReadMsg?listName=General&msgNo=1155 [2] http://www.w3.org/Submission/2004/SUBM-CBD-20040930/ [3] http://dannyayers.com/archives/2005/10/01/sparql-squared/ -- http://dannyayers.com
Received on Sunday, 23 October 2005 20:38:54 UTC