Re: The king is dressed in void from Andreas Langegger on 2008-06-12 (public-lod@w3.org from June 2008)

From: Andreas Langegger <alangegger@mac.com>
Date: Thu, 12 Jun 2008 17:39:05 +0200
To: "Hausenblas, Michael" <michael.hausenblas@joanneum.at>
Cc: Giovanni Tummarello <giovanni.tummarello@deri.org>, public-lod@w3.org, Semantic Web <semantic-web@w3.org>
Message-id: <3A8EAC58-722E-4B43-8250-77FFFC63BE8C@mac.com>
Hi there,

I've done data integration based on SPARQL in a "restricted" domain,  
not web-scale (see SemWIQ presentation at ESWC08 [1]). But the issues  
are similar. We need some descriptions about sites, owner, license,  
etc. In our case this is provided upon registration of data sources at  
the mediator which is maintaining a site catalog.

For me there are two points that count for voiD:
1. we just need those meta-data about the maintainer of endpoints
2. we need some simple "pre-compiled" statistical information just  
because of performance (I think so)
   Of course, data should be self-describing and you could fetch all  
data and collect your own stats using SPARQL, but this will produce  
unnecessary load to servers.
   Additionally, curently SPARQL does not support aggregate functions  
(at least not the spec) which allow you to retrieve already aggregated  
stat data.

One possible way to achieve this is to provide voiD data as part of  
the actual graph exposed by the SPARQL endpoint. SPARQL can be used to  
retrieve meta data without the need for an additional meta-description  
layer. However, I think the real problem is, that sometimes this is  
not (easily) possible. In such cases a simple file resource could add  
the required information. But how should the search engine, client,  
etc. know where to find meta data? First try using SPARQL, then file  
location by convention? I don't know...

For voiD there are two things:
1. definition of a metaLOD vocabulary
2. specifying a convention of "where to find meta data" (like "/ 
robots.txt" or "/sitemap.xml")

1. is easier than 2.

Regarding statistics: I'm working on a statistics monitor which can be  
attached to a SPARQL endpoint (at the same host or at least in the  
subnet). It will periodically generate stats for the data stored  
behind the SPARQL endpoint. Because it works via SPARQL, it can be  
used regardless of the implementation (my actually be a wrapper like  
D2R-Server). I basically need this for query optimization in SemWIQ.

It would be great if I could use outcomings from the voiD approach.  
That's why I'd like to get involved.

Regards
AndyL

[1] http://semwiq.faw.uni-linz.ac.at


On Jun 12, 2008, at 8:49 AM, Hausenblas, Michael wrote:

>
>
> Giovanni,
>
> I think I see your argument here and I tend to agree up to a certain
> point. What makes me wonder is that it is *you* stating this ;)
>
> Seriously, I very much believe in self-descriptive documents, etc. I  
> do
> prefer simple things that work. However, voiD is just the next logical
> step after semantic sitemaps (it actually is thought to extend it in
> terms of using the sc:datasetURI as the entry point, see also [1]).  
> So,
> just in case you want to argument against your own proposal, please  
> tell
> me so ;)
>
> I guess you're right that many things can be done already and I'm
> positive that we should use the current layer, then advance to the  
> next.
> But what if, say, the current layer is missing something. To whom is  
> it
> up to decide when we are done? I guess it is up to the people using  
> it.
> So, let's not judge a book by its cover, please.
>
> voiD intends to formalise what is already used in practice. I myself
> have built some applications that exploit the LOD datasets and others
> certainly have done as well. As it seems, there is a certain need to  
> do
> what we have done up to now mainly in our brains, in a more automated
> way. There we are: a clear demand for something, a proposal to solve  
> it.
> It is as simple as it is. If it turns out that LOD dataset provides
> don't use it - fine. They might use other methods, then, or nothing at
> all.
>
> I see two issues with what you propose, however - granularity &
> scalability. Currently we have identified two use cases for voiD:
>
> 1. automatic creation of a map (such as http://sindice.com/map)
> 2. topic-based selection of LOD datasets
>
> I guess you're kinda familiar with (1). Now, think about scalability.
> Today we have a bunch of LOD data sets or other sources -  tomorrow we
> may have 10k and next year maybe a million. Next, when looking at (2),
> I'd like to have a reliable, simple method to determine a 'good' entry
> point into the LOD cloud. As soon as I'm in, I can follow my nose  
> using
> basically what you propose.
>
> Finally, the reactions so far tell us that voiD seems to be what  
> people
> where waiting for in terms of easy to use and powerful enough to  
> have an
> added value.
>
> Concluding, it is not 'Giovanni vs. voiD', it is Giovanni + voiD for a
> better, finally a *real* Semantic Web.
>
> Cheers,
> 	Michael
>
>
> [1] http://sw.joanneum.at/voiD/img/void_discovery.png
>
> ----------------------------------------------------------
> Michael Hausenblas, MSc.
> Institute of Information Systems & Information Management
> JOANNEUM RESEARCH Forschungsgesellschaft mbH
>
> http://www.joanneum.at/iis/
> ----------------------------------------------------------
>
>
>> -----Original Message-----
>> From: g.tummarello@gmail.com [mailto:g.tummarello@gmail.com]
>> On Behalf Of Giovanni Tummarello
>> Sent: Thursday, June 12, 2008 12:08 AM
>> To: Hausenblas, Michael
>> Cc: public-lod@w3.org; Semantic Web
>> Subject: The king is dressed in void
>>
>> Wasnt RDF all aabout being self describing?
>>
>> if i say "giovanni works in research" .. do i really need a
>> vucabolary that says "this rdf contains informations that describe
>> what people claim to be working on" that's a suicide. If this is the
>> case (which i totally dont believe) then the king is seriously naked
>> and there is no hope whatsoever that RDF is going to have any
>> relevance (and there i say it)
>>
>> to find one such file, instead of having to invent agree and markup
>> i'd say its much easier to do something like [1] or [2].
>> this is not marketing. its a plea to NOT jump on more layers of stuff
>> when the previous layers have really to show there value and
>> adoptability still. Solve some simple use cases first then jump to  
>> the
>> more complex one.
>>
>> Giovanni
>>
>> [1]
>> http://demo.sindice.com/search?q=*+%3Chttp%3A%2F%2Fwww.w3.org%2
> F2006%2Fvcard%2Fns%23title%3E+%27research%27&qt=advanced
>>
>> or
>> http://sindice.com/search?q=http%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1
>> %2Fknows&qv=http%3A%2F%2Frichard.cyganiak.de%2Ffoaf.rdf 
>> %23cygri&qt=ifp
>> (documents which contain statements in which someone claims to be
>> knowing richard)
>>
>> [2] http://forum.sindice.com/showthread.php?t=10
>>
>>
>> On Wed, Jun 11, 2008 at 8:54 AM, Hausenblas, Michael
>> <michael.hausenblas@joanneum.at> wrote:
>>>
>>>
>>> Dear interested people in linked datasets,
>>>
>>> As you may have gathered, we have recently initiated a
>> discussion on how
>>> to discover the linked dataset cloud [1]. The result of our  
>>> impromptu
>>> kick-off meeting at the ESWC08 is literally voiD - the '
>> vocabulary of
>>> interlinked datasets' (see notes at [2]). This is a proposal for a
>>> vocabulary and a mechanism how it should be deployed and
>> used. We have
>>> some first slides available at [3] as well.
>>>
>>> Please consider commenting on it either by replying to this message
>>> and/or sharing your thoughts with us at the Wiki [2].
>>>
>>> Cheers,
>>>      Michael
>>>
>>> [1] http://richard.cyganiak.de/2007/10/lod/
>>> [2]
>>>
>> http://community.linkeddata.org/MediaWiki/index.php?MetaLOD#Kic
> k-off_mee
>>> ting_at_ESWC08
>>> [3]
>>> http://www.slideshare.net/mediasemanticweb/full-eswc08-lightning- 
>>> talk
>>>
>>> ----------------------------------------------------------
>>> Michael Hausenblas, MSc.
>>> Institute of Information Systems & Information Management
>>> JOANNEUM RESEARCH Forschungsgesellschaft mbH
>>> Steyrergasse 17, A-8010 Graz, AUSTRIA
>>>
>>> <office>
>>> phone: +43-316-876-1193 (fax:-1191)
>>> mobile: +43-699-1876-1165
>>> e-mail: michael.hausenblas@joanneum.at
>>> skype: mhausenblas
>>>   web: http://www.joanneum.at/iis/
>>>
>>> <see also>
>>>        http://sw-app.org/about.html
>>>        http://riese.joanneum.at
>>> ----------------------------------------------------------
>>>
>>>
>>
>


----------------------------------------------------------------------
Dipl.-Ing.(FH) Andreas Langegger
Institute for Applied Knowledge Processing
Johannes Kepler University Linz
A-4040 Linz, Altenberger Straße 69
http://www.langegger.at
Received on Thursday, 12 June 2008 16:50:30 UTC