- From: Danny Ayers <danny.ayers@gmail.com>
- Date: Mon, 8 Dec 2008 01:56:34 +0100
- To: marko@lanl.gov
- Cc: "Giovanni Tummarello" <giovanni.tummarello@deri.org>, semantic-web <semantic-web@w3.org>, public-lod <public-lod@w3.org>
2008/12/8 Marko A. Rodriguez <marko@lanl.gov>: > Google does have a limit and it was hit the moment it was created. Google > (the search engine) doesn't solve all my problems---it only solves the > keyword index and rank problem. This is a very specific computation for a > very specific problem. ...which happens to cover 90% of the space on a doc-oriented Web. That people use. (A lot of of people also use Yahoo! etc, but Google is the poster child) If I had Google's dataset, there are other > algorithms I would like to execute. Yup. But given that Google is the > gatekeepers to their data, and only have so many clock cycles they can > spend executing computations, it appears that if I will have to download > the web myself if I wish to run my desired algorithms. Nah, just the bits you're interested in. Delegate the number crunching. > Let us now map this over to a seemingly analogous service, Sindince (note > that I have only seen this service now from your email). While Sindice is > nice in terms of providing an index for RDF data, the service doesn't have > the processing power (nor man power) to run (and implement) all the > algorithms that people will want to run on that data---semantic network > page rank, betweenness centrality, calculate eigenvectors, spreading > activation, metadata propagation, etc. etc. -- and all those algorithms > that are still to be designed. While it might provide Linked Data accesss > to its data (I don't know, but lets say it does), I would have to > pull/download all that information to my local machine to compute on it. > Sindince doesn't provide me a general-purpose computing environment to > interact with its data on its servers. Wow, fighting talk. I think the originally-Ancona guys are going to have to build this. What I want someone to find for me is a little algorighm in which you got a bunch of nodes, they shoot out creepy tendrils. When a tendril from one creepy thing meets another, all hell breaks loose (but otherrwise dull & rainsome). That seems to be were we are. > This is the problem with the concept of "Internet giants" (or "web > giants") in the Semantic Web world. The RDF data model is to rich to be > left to keyword search and to vast to be contained and processed by a > single service. The point is that a distributed process infrastructure > would befit this wonderful distributed data structure. That was almost Biblical, shame is I agree. > Take care, Ditto. Danny. > Marko A. Rodriguez > http://markorodriguez.com > >> > >> >> In its little, Sindice has no theoretical limit in the amount of >> triples it can index, maintaining its query speed, given a sufficient >> hardware (approx linear in size of the increase in query number and/or >> data size) and well known software plumbing tricks. >> >> So the problem is really not native dataset (bring them on!) but wrappers. >> >> On the Semantic web there can be countless useful wrapper and data >> transformers which can produce billions of virtual triples as >> transformation of some other data sources. >> These should not be indexed in brute force mode, probably, as in >> "triples" but probably identified as such and indexed for the service >> they provide "i give you some pictures in RDF using flickr.. so dont >> harvest me even if i look like linked data but invoke me when needed". >> >> But then this becomes complex and somewhat feels arbitrary and one >> quickly starts thinking of some other matter. >> >> Giovanni >> >> On Sun, Dec 7, 2008 at 11:57 PM, Danny Ayers <danny.ayers@gmail.com> >> wrote: >>> >>> Abstract looks excellent, though personally I'd drop the hypens ('-'). >>> Now to read a paper! >>> >>> 2008/12/8 Marko A. Rodriguez <marko@lanl.gov>: >>>> Hi all, >>>> >>>> Here is a short column that I wrote that is in line with this thread of >>>> thought: >>>> >>>> http://arxiv.org/abs/0807.3908 >>>> >>>> It addresses the importance of a distributed computing infrastructure >>>> for >>>> the Linked Data cloud, where the "download and index" philosophy of the >>>> World Wide Web won't so easily port over. >>>> >>>> Take care, >>>> Marko A. Rodriguez >>>> http://markorodriguez.com >>>> >>>> >>>>> 2008/12/7 Sw-MetaPortal-ProjectParadigm <metadataportals@yahoo.com>: >>>>>> The next Internet giant company will be linking open data and >>>>>> providing >>>>>> open >>>>>> access to repositories, in the process seamlessly combining both paid >>>>>> for >>>>>> subscriptions, Creative Commons or similar license based or open >>>>>> source >>>>>> software schemes. >>>>>> >>>>>> Revenues will be generated among other things from online advertising >>>>>> streams currently not utilized by Google or Yahoo! >>>>> >>>>> ..and the other things, not advertising, can you describe them? >>>>> >>>>>> In the big scheme of things this company will redefine the concept of >>>>>> internet search to provide access to deep(er) web levels of data and >>>>>> information for which users will be willing to pay an annual flat fee >>>>>> subscription. >>>>> >>>>> ..and the other things, not search, can you describe them? >>>>> >>>>> Sorry. Seriously I haven't a clue what revenue models we'll be seeing >>>>> in 10 or 20 years. I suspect I'd be surprised. >>>>> >>>>>> Sound improbable? Non-profit organizations dedicated to providing >>>>>> global >>>>>> open access will soon start exploring just such business schemes to >>>>>> determine if it is feasible to fund and maintain the server farms, >>>>>> hard >>>>>> and >>>>>> software to do just that. >>>>> >>>>> Cool. >>>>> >>>>> But the Rainbow Warrior was the Greenpeace yacht right? >>>>> So how do I know you're not just trying to subvert things here? It >>>>> happens. >>>>> Usually in boats. >>>>> >>>>> Cheers, >>>>> Danny. >>>>> >>>>> -- >>>>> http://danny.ayers.name >>>>> >>>>> >>>> >>>> >>> >>> >>> >>> -- >>> http://danny.ayers.name >>> >> >> > > -- http://danny.ayers.name
Received on Monday, 8 December 2008 00:57:09 UTC