W3C home > Mailing lists > Public > semantic-web@w3.org > December 2008

Re: The next Internet giant: linking open data, providing open access to repositories

From: Danny Ayers <danny.ayers@gmail.com>
Date: Mon, 8 Dec 2008 01:56:34 +0100
Message-ID: <1f2ed5cd0812071656v3b9fe6f3s4d38d3920b446982@mail.gmail.com>
To: marko@lanl.gov
Cc: "Giovanni Tummarello" <giovanni.tummarello@deri.org>, semantic-web <semantic-web@w3.org>, public-lod <public-lod@w3.org>

2008/12/8 Marko A. Rodriguez <marko@lanl.gov>:

> Google does have a limit and it was hit the moment it was created. Google
> (the search engine) doesn't solve all my problems---it only solves the
> keyword index and rank problem. This is a very specific computation for a
> very specific problem.

...which happens to cover 90% of the space on a doc-oriented Web. That
people use.
(A lot of of people also use Yahoo! etc, but Google is the poster child)

 If I had Google's dataset, there are other
> algorithms I would like to execute.


But given that Google is the
> gatekeepers to their data, and only have so many clock cycles they can
> spend executing computations, it appears that if I will have to download
> the web myself if I wish to run my desired algorithms.

Nah, just the bits you're interested in. Delegate the number crunching.

> Let us now map this over to a seemingly analogous service, Sindince (note
> that I have only seen this service now from your email). While Sindice is
> nice in terms of providing an index for RDF data, the service doesn't have
> the processing power (nor man power) to run (and implement) all the
> algorithms that people will want to run on that data---semantic network
> page rank, betweenness centrality, calculate eigenvectors, spreading
> activation, metadata propagation, etc. etc. -- and all those algorithms
> that are still to be designed. While it might provide Linked Data accesss
> to its data (I don't know, but lets say it does), I would have to
> pull/download all that information to my local machine to compute on it.
> Sindince doesn't provide me a general-purpose computing environment to
> interact with its data on its servers.

Wow, fighting talk. I think the originally-Ancona guys are going to
have to build this.

What I want someone to find for me is a little algorighm in which you
got a bunch of nodes, they shoot out creepy tendrils. When a tendril
from one creepy thing meets another, all hell breaks loose (but
otherrwise dull & rainsome). That seems to be were we are.

> This is the problem with the concept of "Internet giants" (or "web
> giants") in the Semantic Web world. The RDF data model is to rich to be
> left to keyword search and to vast to be contained and processed by a
> single service. The point is that a distributed process infrastructure
> would befit this wonderful distributed data structure.

That was almost Biblical, shame is I agree.

> Take care,


> Marko A. Rodriguez
> http://markorodriguez.com
>> In its little, Sindice has no theoretical limit in the amount of
>> triples it can index, maintaining its query speed, given a sufficient
>> hardware (approx linear in size of the increase in query number and/or
>> data size) and well known software plumbing tricks.
>> So the problem is really not native dataset (bring them on!) but wrappers.
>>  On the Semantic web there can be countless useful wrapper and data
>> transformers which can produce billions of virtual triples as
>> transformation of some other data sources.
>> These should not be indexed in brute force mode, probably, as in
>> "triples" but probably identified as such and indexed for the service
>> they provide "i give you some pictures in RDF using flickr.. so dont
>> harvest me even if i look like linked data but invoke me when needed".
>> But then this becomes complex and somewhat feels arbitrary and one
>> quickly starts thinking of some other matter.
>> Giovanni
>> On Sun, Dec 7, 2008 at 11:57 PM, Danny Ayers <danny.ayers@gmail.com>
>> wrote:
>>> Abstract looks excellent, though personally I'd drop the hypens ('-').
>>> Now to read a paper!
>>> 2008/12/8 Marko A. Rodriguez <marko@lanl.gov>:
>>>> Hi all,
>>>> Here is a short column that I wrote that is in line with this thread of
>>>> thought:
>>>> http://arxiv.org/abs/0807.3908
>>>> It addresses the importance of a distributed computing infrastructure
>>>> for
>>>> the Linked Data cloud, where the "download and index" philosophy of the
>>>> World Wide Web won't so easily port over.
>>>> Take care,
>>>> Marko A. Rodriguez
>>>> http://markorodriguez.com
>>>>> 2008/12/7 Sw-MetaPortal-ProjectParadigm <metadataportals@yahoo.com>:
>>>>>> The next Internet giant company will be linking open data and
>>>>>> providing
>>>>>> open
>>>>>> access to repositories, in the process seamlessly combining both paid
>>>>>> for
>>>>>> subscriptions, Creative Commons or similar license based or open
>>>>>> source
>>>>>> software schemes.
>>>>>> Revenues will be generated among other things from online advertising
>>>>>> streams currently not utilized by Google or Yahoo!
>>>>> ..and the other things, not advertising, can you describe them?
>>>>>> In the big scheme of things this company will redefine the concept of
>>>>>> internet search to provide access to deep(er) web levels of data and
>>>>>> information for which users will be willing to pay an annual flat fee
>>>>>> subscription.
>>>>> ..and the other things, not search, can you describe them?
>>>>> Sorry. Seriously I haven't a clue what revenue models we'll be seeing
>>>>> in 10 or 20 years. I suspect I'd be surprised.
>>>>>> Sound improbable? Non-profit organizations dedicated to providing
>>>>>> global
>>>>>> open access will soon start exploring just such business schemes to
>>>>>> determine if it is feasible to fund and maintain the server farms,
>>>>>> hard
>>>>>> and
>>>>>> software to do just that.
>>>>> Cool.
>>>>> But the Rainbow Warrior was the Greenpeace yacht right?
>>>>> So how do I know you're not just trying to subvert things here? It
>>>>> happens.
>>>>> Usually in boats.
>>>>> Cheers,
>>>>> Danny.
>>>>> --
>>>>> http://danny.ayers.name
>>> --
>>> http://danny.ayers.name

Received on Monday, 8 December 2008 00:57:09 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:10 UTC