- From: Giovanni Tummarello <g.tummarello@gmail.com>
- Date: Sun, 18 Oct 2009 14:56:17 +0100
- To: Hugh Glaser <hg@ecs.soton.ac.uk>
- Cc: "public-lod@w3.org" <public-lod@w3.org>, Sindice general discussions list <sindice-general@lists.deri.org>
Hi Hugh, thanks for your contribution .. it turns out this discussion is in fact very very important and such feedback is indeed very useful if i just get a sitemap from sponger (which is wrapping a sitemap from another site) then all i can do is really just crawling that sitemap which would call the sponger to be banned from the remote site which is beiign wrapped. a way to avoid that is to implement a mechanism by which the sitemap tells me what it is doing "hey i wrap amazon, so i can be involked with anmes of people and tell you books they might havewritten or names of book and give you prieces or something else" and then sindice could use that on the fly when a request comes. ... this makes sense.. but we're back to semantic web services are we not? :-) i mean to be able to express the above sentence to the point where sindice or who else knows when to invoke that wrapper we'd have to come up with such a complicated description language that it would simply.. never be adopted. (see the SWS lession) Search engines, n the other hand, are indeed allowed to crawl Amazon and other site. and do my own sponging, why not, google does it. so i guess it could boil down to a) got nmative RDFa ? ok we crawl you .. but we cant be that updated b) got native RDFa and understand that it is a value for you for engines to be very updated? then provide a dump. But we shoul dnot forget what the actual web is doing either :-) e.g. so probably we better start implementing assap this http://www.readwriteweb.com/archives/real-time_web_protocol_pubsubhubbub_explained.php Pubsubhubhub thingie c) got nothing? well we might crawl you normally with some spongers on top? (but this is still something that puzzles me. One thing in sindice is that everything that's in there hsa been explicitly stated by data producers,.. if i started to do this i'd be losing the ability to say this. Again i am not sure this really matters but you might lose the ability to claim fair use by saying the entire system is automated (AFAIU the main defense google has for collecting and show (e.g. in the previus) all the material is that they.. collect all automatically no human intervenction) cheers Giovanni On Sun, Oct 18, 2009 at 7:57 AM, Hugh Glaser <hg@ecs.soton.ac.uk> wrote: > Hi Guys, > I am puzzled by the whole discussion, so will try to summarise to find out > if I have some misunderstanding. > > It really is "just" about finding where the URIs are, and search engines are > the game in town. We need to make it really easy for people to find the > Linked Data URIs they need. Wrappers make things a bit harder. > > Juan asked if "Sindice crawled the whole regular web and checked the > Spongers for each URL (sic!)". > I read this as: "Can I use Sindice to find Linked Data URIs provided by the > Spongers?" Or to put it yet another way, "Does Sindice index the part of the > Semantic Web provided by the Spongers?" > > One way to do this would be to do what Juan suggests - model what the > Spongers are doing, and then infer what the Linked Data URIs would be, based > on the URLs of the underlying web pages, having crawled them. > > But there seems to me a much simpler and more principled way - the Sponger > should do it. > Spongers should provide Semantic Sitemaps (and of course voiD descriptions), > so that Sindice can index (not *crawl*, which I think has lead to some of > the confusion) the sites. > > How might this be done? > Well, certainly where the Sponger is connected to a particular site which > has an ordinary Sitemap, it could/should process it as part of the > connection with a site, and then re-publish the Semantic Sitemap. For sites > that don't have Sitemaps, it may/will be somewhat harder. > I may be misunderstanding Spongers as well, but it all seems pretty clean > and straightforward to me. > > Great stuff, of course. > > Best > Hugh > >>> >>> On Sat, Oct 17, 2009 at 3:32 PM, Juan Sequeda <juanfederico@gmail.com> wrote: >>> >>>> But Sindice could at least crawl Amazon. >>>> It would be great to use sig.ma to create a "meshup" with the amazon data. >>>> >>>> >>>> Juan Sequeda, Ph.D Student >>>> Dept. of Computer Sciences >>>> The University of Texas at Austin >>>> www.juansequeda.com >>>> www.semanticwebaustin.org >>>> >>>> >>>> On Sat, Oct 17, 2009 at 9:28 AM, Martin Hepp (UniBW) >>>> <hepp@ebusiness-unibw.org> wrote: >>>> >>>>> I don't think so, because this would require that Sindice crawled the >>>>> whole regular web and checked the Spongers for each URL (sic!). >>>>> >>>>> Juan Sequeda wrote: >>>>> >>>>> Does Sindice crawl this (or any other semantic web search engines)? >>>>> Juan Sequeda, Ph.D Student >>>>> Dept. of Computer Sciences >>>>> The University of Texas at Austin >>>>> www.juansequeda.com >>>>> www.semanticwebaustin.org >>>>> > > >
Received on Sunday, 18 October 2009 13:57:12 UTC