Re: Size matters -- How big is the danged thing from Juan Sequeda on 2008-11-22 (public-lod@w3.org from November 2008)

From: Juan Sequeda <juanfederico@gmail.com>
Date: Fri, 21 Nov 2008 18:28:26 -0600
To: "Aldo Bucchi" <aldo.bucchi@gmail.com>
Cc: "Kingsley Idehen" <kidehen@openlinksw.com>, public-lod@w3.org, "Olaf Hartig" <hartig@informatik.hu-berlin.de>
Message-ID: <f914914c0811211628i6f79b000of795055e073735b8@mail.gmail.com>
I can't keep quite either.

http://squin.sourceforge.net/

We have been keeping this quite for a while, but we should have a
working demo in the next week or so!

On 11/21/08, Aldo Bucchi <aldo.bucchi@gmail.com> wrote:
>
> On Fri, Nov 21, 2008 at 7:51 PM, Kingsley Idehen <kidehen@openlinksw.com>
> wrote:
>>
>> Yves Raimond wrote:
>>>
>>> On Fri, Nov 21, 2008 at 8:08 PM, Giovanni Tummarello
>>> <giovanni.tummarello@deri.org> wrote:
>>>
>>>>>
>>>>> Overall, that's about 17 billion.
>>>>>
>>>>>
>>>>
>>>> IMO considering myspace 12 billion triples as part of LOD, is quite a
>>>> stretch (same with other wrappers) unless they are provided by the
>>>> entity itself (E.g. i WOULD count in livejournal foaf file on the
>>>> other hand, ok they're not linked but they're not less useful than the
>>>> myspace wrapper are they? (in fact they are linked quite well if you
>>>> use the google social API)
>>>>
>>>
>>> Actually, I don't think I can agree with that. Whether we want it or
>>> not, most of the data we publish (all of it, apart from specific cases
>>> e.g. review) is provided by wrappers of some sort, e.g. Virtuoso, D2R,
>>> P2R, web services wrapper etc. Hence, it makes not sense trying to
>>> distinguish datasets on the basis they're published through a
>>> "wrapper" or not.
>>>
>>> Within LOD, we only segregate datasets for inclusion in the diagram on
>>> the basis they are published according to linked data principles. The
>>> stats I sent reflect just that: some stats about the datasets
>>> currently in the diagram.
>>>
>>> The origin of the data shouldn't matter. The fact that it is published
>>> according to linked data principles and linked to at least one dataset
>>> in the cloud should matter.
>>>
>>>
>>>
>>>>
>>>> Giovanni
>>>>
>>>>
>>>
>>>
>>>
>>
>> Yves,
>>
>> I agree. But I am sure you can also see the inherent futility in pursuing
>> the size of the pure Linked Data Web :-)  The moment you arrive at a
>> number
>> it will be obsolete :-)
>>
>> I would frame the question this way: is LOD hub now dense enough for basic
>> demonstrations of Linked Data Web utility to everyday Web users? For
>> example, can we "Find" stuff on the Web with levels of precision and
>> serendipity erstwhile unattainable? Can we now tag stuff on the Web in a
>> manner that makes tagging useful? Can we alleviate the daily costs of Spam
>> on mail inboxes? Can all of the aforementioned provide the basis for
>> relevant discourse discovery and participation?
>
> Sorry, this is getting too interesting to stay in lurker mode ;)
>
> Kingsley, absolutely. We have got to that point. The fun part has begun.
>
> To quote Jim, who started this thread:
>
> http://blogs.talis.com/nodalities/2008/03/jim_hendler_talks_about_the_se.php
>
> Go to minute 28 aprox ( I can't listen to it here, I just blocked mp3's ).
> Jim touches on how a geo corpus can be used to dissambiguate tags on flickr.
> This is one such use, low hanging fruit wrt the huge amount of linked
> data, and a first timer in terms of IT.
>
> This was not possible last year!
> It is now.
>
> I guess that is THE question now: What can we do this year that we
> couldn't do last year?
> ( thanks to the massive amount of available LOD ).
>
> Best,
> A
>
>>
>> --
>>
>>
>> Regards,
>>
>> Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
>> President & CEO OpenLink Software     Web: http://www.openlinksw.com
>>
>>
>>
>>
>>
>>
>
>
>
> --
> Aldo Bucchi
> U N I V R Z
> Office: +56 2 795 4532
> Mobile:+56 9 7623 8653
> skype:aldo.bucchi
> http://www.univrz.com/
> http://aldobucchi.com
>
> PRIVILEGED AND CONFIDENTIAL INFORMATION
> This message is only for the use of the individual or entity to which it is
> addressed and may contain information that is privileged and confidential.
> If
> you are not the intended recipient, please do not distribute or copy this
> communication, by e-mail or otherwise. Instead, please notify us immediately
> by
> return e-mail.
> INFORMACIÓN PRIVILEGIADA Y CONFIDENCIAL
> Este mensaje está destinado sólo a la persona u organización al cual está
> dirigido y podría contener información privilegiada y confidencial. Si usted
> no
> es el destinatario, por favor no distribuya ni copie esta comunicación, por
> email o por otra vía. Por el contrario, por favor notifíquenos
> inmediatamente
> vía e-mail.
>
>


-- 
Juan Sequeda, Ph.D Student

Research Assistant
Dept. of Computer Sciences
The University of Texas at Austin
http://www.cs.utexas.edu/~jsequeda
jsequeda@cs.utexas.edu

http://www.juansequeda.com/

Semantic Web in Austin: http://juansequeda.blogspot.com/
Received on Saturday, 22 November 2008 00:29:02 UTC