- From: Giovanni Tummarello <giovanni.tummarello@deri.org>
- Date: Fri, 9 Jul 2010 02:24:29 +0200
- To: Semantic Web at W3C <semantic-web@w3.org>
- Message-ID: <AANLkTinFiB3cBlx9IXXPAppPhmOEUa9ufkSz-HxJmw74@mail.gmail.com>
Apologies for cross posting --------- Dear all So far semantic web search engines and semantic aggregation services have been inserting datasets by hand or have been based on "random walk" like crawls with no data completeness or freshness guarantees. After quite some work, we are happy to announce that Sindice is now supporting effective large scale data acquisition with *efficient syncing* capabilities based on already existing standards (a specific use of the sitemap protocol). For example if you publish 300000 products using RDFa or whatever you want to use (microformats, 303s etc), by making sure you comply to the proposed method, Sindice will now guarantee you a) to crawl your dataset completely (might take some time since we do this "politely") b) ..but only crawl you once and then get just the updated URLs on a daily bases! (so timely data update guarantee) So this is not "Crawling" anymore, but rather a live "DB like" connection between remote, diverse dataset all based on http. in our opinion this is a *very* important step forward for semantic web data aggregation infrastructures. The specification we support (and how to make sure you're being properly indexed) are published here (pretty simple stuff actually!) http://sindice.com/developers/publishing and results can be seen from websites which are already implementing these (you might be already doing that indeed without knowing..) http://sindice.com/search?q=domain:www.scribd.com+date:last_week&qt=term Why not make sure that your site can be effectively kept in sync today? As always we look forward for comments, suggestions and ideas on how to serve better your data needs (e.g. yes, we'll also support Openlink dataset sync proposal once the specs are finalized). Feel free to ask specific questions about this or any other Sindice related issue on our dev forum http://sindice.com/main/forum Giovanni, on behalf of the Sindice team http://sindice.com/main/about. Special credits for this to Tamas Benko and Robert Fuller. p.s. we're hiring
Received on Friday, 9 July 2010 00:24:57 UTC