- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Sun, 18 Oct 2009 14:03:04 -0400
- CC: "public-lod@w3.org" <public-lod@w3.org>
Hugh Glaser wrote: > Hi Guys, > I am puzzled by the whole discussion, so will try to summarise to find out > if I have some misunderstanding. > > It really is "just" about finding where the URIs are, and search engines are > the game in town. We need to make it really easy for people to find the > Linked Data URIs they need. Wrappers make things a bit harder. > > Juan asked if "Sindice crawled the whole regular web and checked the > Spongers for each URL (sic!)". > I read this as: "Can I use Sindice to find Linked Data URIs provided by the > Spongers?" Or to put it yet another way, "Does Sindice index the part of the > Semantic Web provided by the Spongers?" > > One way to do this would be to do what Juan suggests - model what the > Spongers are doing, and then infer what the Linked Data URIs would be, based > on the URLs of the underlying web pages, having crawled them. > > But there seems to me a much simpler and more principled way - the Sponger > should do it. > Spongers should provide Semantic Sitemaps (and of course voiD descriptions), > so that Sindice can index (not *crawl*, which I think has lead to some of > the confusion) the sites. > > How might this be done? > Well, certainly where the Sponger is connected to a particular site which > has an ordinary Sitemap, it could/should process it as part of the > connection with a site, and then re-publish the Semantic Sitemap. For sites > that don't have Sitemaps, it may/will be somewhat harder. > I may be misunderstanding Spongers as well, but it all seems pretty clean > and straightforward to me. > > Great stuff, of course. > Hugh, Quick Sponger Glossary: Sponger -- The Data Access Manager Layer Basic Cartridges/Drivers/Providers -- The components that perform the extraction and transformation into RDF model based Linked Data graphs Meta Cartridges -- Smarter Cartridges that perform Lookups and leverage Inference Rules etc.. which are added to the basic Linked Data graphs (*these aren't part of the Open Source Edition of Virtuoso*). A Sponger generated Linked Data graph does optionally include VoiD descriptions; why wouldn't it, bearing in mind our proximity to VoiD)? We just disabled while updating the Sponger Engine etc.. You must have seen VoiD graphs in earlier Sponger proxy URIs, right? They will be re-enabled very soon :-) All sponged data ends up in the Quad Store of its host Virtuoso instance (so <http://uriburner.com/sparql> and <http://uriburner.com/fct> are in place as per usual re. Virtuoso instances). A Virtuoso Sponger instance can optionally ping PTSW each time it makes a Linked Data graph from its Web Resource RDFization activity, so Sindice and others engines that already subscribe to PTSW also have access to the Sponger generated Linked Data. Sponger proxy/wrapper URIs are Data Source Names, if you look at the graph closely you should see how we express with clarity what we are doing i.e., note the "owl:sameAs" assertion to the original data source, and the "owl:shameAs" pattern which is about minting a hint URI back to the data source we've sponged. What we will be adding is some additional Provenance Data now that we have a good shared ontology in place. As stated above, the existence of <http://uriburner.com/fct> implies that the faceted search and find engine in also alive and any client application can use the REST or SOAP services it provides to perform disambiguated search and find queries. This means, Sindice can lookup Sponger instances on the same way the Sponger looks up Sindice via its Web Services. When you use Virtuoso, with the Sponger Middleware Enabled, Web Document URLs basically become Named Graph IRIs re. SPARQL (the point Martin emphasized in his post). When you use the OpenLink Data Explorer (ODE) [5], you can also bind the Browser to bind to any of the instances above, and exploit the effect of sponging by simply using invoking the View Page Metadata option (main or context menu or via the URIBurner Bookmarklet). Links: 1. http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtSponger 2. http://uriburner.com -- instance of Virtuoso with Sponger Middleware enabled 3. http://bbc.openlinksw.com -- instance of Virtuoso with Sponger Middleware enabled 4. http://lod.openlinksw.com -- instance of Virtuoso with Sponger Middleware enabled 5. http://ode.openlinksw.com -- OpenLink Data Explorer 6. http://trdf.sourceforge.net/provenance/ns-20090825.html -- Provenance Ontology Kingsley > Best > Hugh > > >>> On Sat, Oct 17, 2009 at 3:32 PM, Juan Sequeda <juanfederico@gmail.com> wrote: >>> >>> >>>> But Sindice could at least crawl Amazon. >>>> It would be great to use sig.ma to create a "meshup" with the amazon data. >>>> >>>> >>>> Juan Sequeda, Ph.D Student >>>> Dept. of Computer Sciences >>>> The University of Texas at Austin >>>> www.juansequeda.com >>>> www.semanticwebaustin.org >>>> >>>> >>>> On Sat, Oct 17, 2009 at 9:28 AM, Martin Hepp (UniBW) >>>> <hepp@ebusiness-unibw.org> wrote: >>>> >>>> >>>>> I don't think so, because this would require that Sindice crawled the >>>>> whole regular web and checked the Spongers for each URL (sic!). >>>>> >>>>> Juan Sequeda wrote: >>>>> >>>>> Does Sindice crawl this (or any other semantic web search engines)? >>>>> Juan Sequeda, Ph.D Student >>>>> Dept. of Computer Sciences >>>>> The University of Texas at Austin >>>>> www.juansequeda.com >>>>> www.semanticwebaustin.org >>>>> >>>>> > > > > -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President & CEO OpenLink Software Web: http://www.openlinksw.com
Received on Sunday, 18 October 2009 18:03:33 UTC