Re: The king is dressed in void from Giovanni Tummarello on 2008-06-12 (public-lod@w3.org from June 2008)

From: Giovanni Tummarello <giovanni.tummarello@deri.org>
Date: Fri, 13 Jun 2008 00:27:33 +0100
To: al@jku.at
Cc: "Hausenblas, Michael" <michael.hausenblas@joanneum.at>, public-lod@w3.org, "Semantic Web" <semantic-web@w3.org>
Message-ID: <210271540806121627r52b62d90k5935c8d5ab93b9c6@mail.gmail.com>
Andreas,

Licence: yes i agree, it will be added to the sitemap extention much
like it happens in microformats already.
if you want to use RDF i believe this is what you're looking for
http://validator.creativecommons.org/

Statistics:

i'd tend to see this use case as a low level one that concerns
implementation of distributed sparql (an interesting aspect however!).
It seems strange at least to ask people to write some triples to say
how many triples they have when a sparql endpoint is there just to
answer you any query you might want?

If i understand correctly you would have to create terms for each of
the possible queries that you might want to do to gather statistics..
or to model these queries in RDF itself to be able to say "and this
query returns X" results.

In practice, correct me if i am wrong doesnt any implementation of
sparql supports aggregates already basically? (e.g. virtuoso..) so if
one is really unlucky you might have to try a couple of syntaxes maybe
but that's about it.

Giovanni

On Thu, Jun 12, 2008 at 4:39 PM, Andreas Langegger <alangegger@mac.com> wrote:
> Hi there,
>
> I've done data integration based on SPARQL in a "restricted" domain, not
> web-scale (see SemWIQ presentation at ESWC08 [1]). But the issues are
> similar. We need some descriptions about sites, owner, license, etc. In our
> case this is provided upon registration of data sources at the mediator
> which is maintaining a site catalog.
>
> For me there are two points that count for voiD:
> 1. we just need those meta-data about the maintainer of endpoints
> 2. we need some simple "pre-compiled" statistical information just because
> of performance (I think so)
>  Of course, data should be self-describing and you could fetch all data and
> collect your own stats using SPARQL, but this will produce unnecessary load
> to servers.
>  Additionally, curently SPARQL does not support aggregate functions (at
> least not the spec) which allow you to retrieve already aggregated stat
> data.
>
> One possible way to achieve this is to provide voiD data as part of the
> actual graph exposed by the SPARQL endpoint. SPARQL can be used to retrieve
> meta data without the need for an additional meta-description layer.
> However, I think the real problem is, that sometimes this is not (easily)
> possible. In such cases a simple file resource could add the required
> information. But how should the search engine, client, etc. know where to
> find meta data? First try using SPARQL, then file location by convention? I
> don't know...
>
> For voiD there are two things:
> 1. definition of a metaLOD vocabulary
> 2. specifying a convention of "where to find meta data" (like "/robots.txt"
> or "/sitemap.xml")
>
> 1. is easier than 2.
>
> Regarding statistics: I'm working on a statistics monitor which can be
> attached to a SPARQL endpoint (at the same host or at least in the subnet).
> It will periodically generate stats for the data stored behind the SPARQL
> endpoint. Because it works via SPARQL, it can be used regardless of the
> implementation (my actually be a wrapper like D2R-Server). I basically need
> this for query optimization in SemWIQ.
>
> It would be great if I could use outcomings from the voiD approach. That's
> why I'd like to get involved.
>
> Regards
> AndyL
>
> [1] http://semwiq.faw.uni-linz.ac.at
>
>
> On Jun 12, 2008, at 8:49 AM, Hausenblas, Michael wrote:
>
>>
>>
>> Giovanni,
>>
>> I think I see your argument here and I tend to agree up to a certain
>> point. What makes me wonder is that it is *you* stating this ;)
>>
>> Seriously, I very much believe in self-descriptive documents, etc. I do
>> prefer simple things that work. However, voiD is just the next logical
>> step after semantic sitemaps (it actually is thought to extend it in
>> terms of using the sc:datasetURI as the entry point, see also [1]). So,
>> just in case you want to argument against your own proposal, please tell
>> me so ;)
>>
>> I guess you're right that many things can be done already and I'm
>> positive that we should use the current layer, then advance to the next.
>> But what if, say, the current layer is missing something. To whom is it
>> up to decide when we are done? I guess it is up to the people using it.
>> So, let's not judge a book by its cover, please.
>>
>> voiD intends to formalise what is already used in practice. I myself
>> have built some applications that exploit the LOD datasets and others
>> certainly have done as well. As it seems, there is a certain need to do
>> what we have done up to now mainly in our brains, in a more automated
>> way. There we are: a clear demand for something, a proposal to solve it.
>> It is as simple as it is. If it turns out that LOD dataset provides
>> don't use it - fine. They might use other methods, then, or nothing at
>> all.
>>
>> I see two issues with what you propose, however - granularity &
>> scalability. Currently we have identified two use cases for voiD:
>>
>> 1. automatic creation of a map (such as http://sindice.com/map)
>> 2. topic-based selection of LOD datasets
>>
>> I guess you're kinda familiar with (1). Now, think about scalability.
>> Today we have a bunch of LOD data sets or other sources -  tomorrow we
>> may have 10k and next year maybe a million. Next, when looking at (2),
>> I'd like to have a reliable, simple method to determine a 'good' entry
>> point into the LOD cloud. As soon as I'm in, I can follow my nose using
>> basically what you propose.
>>
>> Finally, the reactions so far tell us that voiD seems to be what people
>> where waiting for in terms of easy to use and powerful enough to have an
>> added value.
>>
>> Concluding, it is not 'Giovanni vs. voiD', it is Giovanni + voiD for a
>> better, finally a *real* Semantic Web.
>>
>> Cheers,
>>        Michael
>>
>>
>> [1] http://sw.joanneum.at/voiD/img/void_discovery.png
>>
>> ----------------------------------------------------------
>> Michael Hausenblas, MSc.
>> Institute of Information Systems & Information Management
>> JOANNEUM RESEARCH Forschungsgesellschaft mbH
>>
>> http://www.joanneum.at/iis/
>> ----------------------------------------------------------
>>
>>
>>> -----Original Message-----
>>> From: g.tummarello@gmail.com [mailto:g.tummarello@gmail.com]
>>> On Behalf Of Giovanni Tummarello
>>> Sent: Thursday, June 12, 2008 12:08 AM
>>> To: Hausenblas, Michael
>>> Cc: public-lod@w3.org; Semantic Web
>>> Subject: The king is dressed in void
>>>
>>> Wasnt RDF all aabout being self describing?
>>>
>>> if i say "giovanni works in research" .. do i really need a
>>> vucabolary that says "this rdf contains informations that describe
>>> what people claim to be working on" that's a suicide. If this is the
>>> case (which i totally dont believe) then the king is seriously naked
>>> and there is no hope whatsoever that RDF is going to have any
>>> relevance (and there i say it)
>>>
>>> to find one such file, instead of having to invent agree and markup
>>> i'd say its much easier to do something like [1] or [2].
>>> this is not marketing. its a plea to NOT jump on more layers of stuff
>>> when the previous layers have really to show there value and
>>> adoptability still. Solve some simple use cases first then jump to the
>>> more complex one.
>>>
>>> Giovanni
>>>
>>> [1]
>>> http://demo.sindice.com/search?q=*+%3Chttp%3A%2F%2Fwww.w3.org%2
>>
>> F2006%2Fvcard%2Fns%23title%3E+%27research%27&qt=advanced
>>>
>>> or
>>> http://sindice.com/search?q=http%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1
>>> %2Fknows&qv=http%3A%2F%2Frichard.cyganiak.de%2Ffoaf.rdf%23cygri&qt=ifp
>>> (documents which contain statements in which someone claims to be
>>> knowing richard)
>>>
>>> [2] http://forum.sindice.com/showthread.php?t=10
>>>
>>>
>>> On Wed, Jun 11, 2008 at 8:54 AM, Hausenblas, Michael
>>> <michael.hausenblas@joanneum.at> wrote:
>>>>
>>>>
>>>> Dear interested people in linked datasets,
>>>>
>>>> As you may have gathered, we have recently initiated a
>>>
>>> discussion on how
>>>>
>>>> to discover the linked dataset cloud [1]. The result of our impromptu
>>>> kick-off meeting at the ESWC08 is literally voiD - the '
>>>
>>> vocabulary of
>>>>
>>>> interlinked datasets' (see notes at [2]). This is a proposal for a
>>>> vocabulary and a mechanism how it should be deployed and
>>>
>>> used. We have
>>>>
>>>> some first slides available at [3] as well.
>>>>
>>>> Please consider commenting on it either by replying to this message
>>>> and/or sharing your thoughts with us at the Wiki [2].
>>>>
>>>> Cheers,
>>>>     Michael
>>>>
>>>> [1] http://richard.cyganiak.de/2007/10/lod/
>>>> [2]
>>>>
>>> http://community.linkeddata.org/MediaWiki/index.php?MetaLOD#Kic
>>
>> k-off_mee
>>>>
>>>> ting_at_ESWC08
>>>> [3]
>>>> http://www.slideshare.net/mediasemanticweb/full-eswc08-lightning-talk
>>>>
>>>> ----------------------------------------------------------
>>>> Michael Hausenblas, MSc.
>>>> Institute of Information Systems & Information Management
>>>> JOANNEUM RESEARCH Forschungsgesellschaft mbH
>>>> Steyrergasse 17, A-8010 Graz, AUSTRIA
>>>>
>>>> <office>
>>>> phone: +43-316-876-1193 (fax:-1191)
>>>> mobile: +43-699-1876-1165
>>>> e-mail: michael.hausenblas@joanneum.at
>>>> skype: mhausenblas
>>>>  web: http://www.joanneum.at/iis/
>>>>
>>>> <see also>
>>>>       http://sw-app.org/about.html
>>>>       http://riese.joanneum.at
>>>> ----------------------------------------------------------
>>>>
>>>>
>>>
>>
>
>
> ----------------------------------------------------------------------
> Dipl.-Ing.(FH) Andreas Langegger
> Institute for Applied Knowledge Processing
> Johannes Kepler University Linz
> A-4040 Linz, Altenberger Straße 69
> http://www.langegger.at
>
>
Received on Thursday, 12 June 2008 23:28:10 UTC