Re: The Power of Virtuoso Sponger Technology from Hugh Glaser on 2009-10-18 (public-lod@w3.org from October 2009)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Sun, 18 Oct 2009 07:57:55 +0100
To: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <EMEW3|0ca0a8ef515d687234cad3d15c8c3846l9H7w702hg|ecs.soton.ac.uk|10B%hg@ecs.sot>

Hi Guys,
I am puzzled by the whole discussion, so will try to summarise to find out
if I have some misunderstanding.

It really is "just" about finding where the URIs are, and search engines are
the game in town. We need to make it really easy for people to find the
Linked Data URIs they need. Wrappers make things a bit harder.

Juan asked if "Sindice crawled the whole regular web and checked the
Spongers for each URL (sic!)".
I read this as: "Can I use Sindice to find Linked Data URIs provided by the
Spongers?" Or to put it yet another way, "Does Sindice index the part of the
Semantic Web provided by the Spongers?"

One way to do this would be to do what Juan suggests - model what the
Spongers are doing, and then infer what the Linked Data URIs would be, based
on the URLs of the underlying web pages, having crawled them.

But there seems to me a much simpler and more principled way - the Sponger
should do it.
Spongers should provide Semantic Sitemaps (and of course voiD descriptions),
so that Sindice can index (not *crawl*, which I think has lead to some of
the confusion) the sites.

How might this be done?
Well, certainly where the Sponger is connected to a particular site which
has an ordinary Sitemap, it could/should process it as part of the
connection with a site, and then re-publish the Semantic Sitemap. For sites
that don't have Sitemaps, it may/will be somewhat harder.
I may be misunderstanding Spongers as well, but it all seems pretty clean
and straightforward to me.

Great stuff, of course.

Best
Hugh

>> 
>> On Sat, Oct 17, 2009 at 3:32 PM, Juan Sequeda <juanfederico@gmail.com> wrote:
>>   
>>> But Sindice could at least crawl Amazon.
>>> It would be great to use sig.ma to create a "meshup" with the amazon data.
>>> 
>>> 
>>> Juan Sequeda, Ph.D Student
>>> Dept. of Computer Sciences
>>> The University of Texas at Austin
>>> www.juansequeda.com
>>> www.semanticwebaustin.org
>>> 
>>> 
>>> On Sat, Oct 17, 2009 at 9:28 AM, Martin Hepp (UniBW)
>>> <hepp@ebusiness-unibw.org> wrote:
>>>     
>>>> I don't think so, because this would require that Sindice crawled the
>>>> whole regular web and checked the Spongers for each URL (sic!).
>>>> 
>>>> Juan Sequeda wrote:
>>>> 
>>>> Does Sindice crawl this (or any other semantic web search engines)?
>>>> Juan Sequeda, Ph.D Student
>>>> Dept. of Computer Sciences
>>>> The University of Texas at Austin
>>>> www.juansequeda.com
>>>> www.semanticwebaustin.org
>>>>

Received on Sunday, 18 October 2009 06:58:39 UTC