Re: The next Internet giant: linking open data, providing open access to repositories from Giovanni Tummarello on 2008-12-08 (public-lod@w3.org from December 2008)

From: Giovanni Tummarello <giovanni.tummarello@deri.org>
Date: Mon, 8 Dec 2008 00:17:15 +0000
To: "Danny Ayers" <danny.ayers@gmail.com>
Cc: marko@lanl.gov, semantic-web <semantic-web@w3.org>, public-lod <public-lod@w3.org>
Message-ID: <210271540812071617t311edf46hf2baa6f68ba003c1@mail.gmail.com>

I tend to disagree on this

" the scale of the data sets that currently exist and will ultimately
grow to become, the "download and index" philosophy of the World Wide
Web will not so easily map over to the Semantic Web."

Does google have any visible limits? using the proper indexing
technology for the Semantic Web gives you the same results. (Note,
forget global SPARQL of course).

In its little, Sindice has no theoretical limit in the amount of
triples it can index, maintaining its query speed, given a sufficient
hardware (approx linear in size of the increase in query number and/or
data size) and well known software plumbing tricks.

So the problem is really not native dataset (bring them on!) but wrappers.

 On the Semantic web there can be countless useful wrapper and data
transformers which can produce billions of virtual triples as
transformation of some other data sources.
These should not be indexed in brute force mode, probably, as in
"triples" but probably identified as such and indexed for the service
they provide "i give you some pictures in RDF using flickr.. so dont
harvest me even if i look like linked data but invoke me when needed".

But then this becomes complex and somewhat feels arbitrary and one
quickly starts thinking of some other matter.

Giovanni

On Sun, Dec 7, 2008 at 11:57 PM, Danny Ayers <danny.ayers@gmail.com> wrote:
>
> Abstract looks excellent, though personally I'd drop the hypens ('-').
> Now to read a paper!
>
> 2008/12/8 Marko A. Rodriguez <marko@lanl.gov>:
>> Hi all,
>>
>> Here is a short column that I wrote that is in line with this thread of
>> thought:
>>
>> http://arxiv.org/abs/0807.3908
>>
>> It addresses the importance of a distributed computing infrastructure for
>> the Linked Data cloud, where the "download and index" philosophy of the
>> World Wide Web won't so easily port over.
>>
>> Take care,
>> Marko A. Rodriguez
>> http://markorodriguez.com
>>
>>
>>> 2008/12/7 Sw-MetaPortal-ProjectParadigm <metadataportals@yahoo.com>:
>>>> The next Internet giant company will be linking open data and providing
>>>> open
>>>> access to repositories, in the process seamlessly combining both paid
>>>> for
>>>> subscriptions, Creative Commons or similar license based or open source
>>>> software schemes.
>>>>
>>>> Revenues will be generated among other things from online advertising
>>>> streams currently not utilized by Google or Yahoo!
>>>
>>> ..and the other things, not advertising, can you describe them?
>>>
>>>> In the big scheme of things this company will redefine the concept of
>>>> internet search to provide access to deep(er) web levels of data and
>>>> information for which users will be willing to pay an annual flat fee
>>>> subscription.
>>>
>>> ..and the other things, not search, can you describe them?
>>>
>>> Sorry. Seriously I haven't a clue what revenue models we'll be seeing
>>> in 10 or 20 years. I suspect I'd be surprised.
>>>
>>>> Sound improbable? Non-profit organizations dedicated to providing global
>>>> open access will soon start exploring just such business schemes to
>>>> determine if it is feasible to fund and maintain the server farms, hard
>>>> and
>>>> software to do just that.
>>>
>>> Cool.
>>>
>>> But the Rainbow Warrior was the Greenpeace yacht right?
>>> So how do I know you're not just trying to subvert things here? It
>>> happens.
>>> Usually in boats.
>>>
>>> Cheers,
>>> Danny.
>>>
>>> --
>>> http://danny.ayers.name
>>>
>>>
>>
>>
>
>
>
> --
> http://danny.ayers.name
>

Received on Monday, 8 December 2008 00:17:51 UTC