Re: Using Linking Open Data datasets from Peter Ansell on 2008-05-30 (public-lod@w3.org from May 2008)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Fri, 30 May 2008 11:19:53 +1000
To: "Giovanni Tummarello" <giovanni.tummarello@deri.org>
Cc: "Hugh Glaser" <hg@ecs.soton.ac.uk>, "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <a1be7e0e0805291819w5a1abbei4ba934f05e8fac85@mail.gmail.com>
I wasn't really querying about whether it would probe the sparql
endpoint, just whether sindice overall would publish these endpoints
or links to known sitemaps possibly where information can be found,
including the extra information about the dbpedia topics that are
related to the individual endpoints to promote discovery. For the
moment the static web page approach that is taken by
http://www.sindice.com/map is okay, although it would be nice as I say
to also produce it as RDF in the linked data way.

Sindice has limited resources I figure, and being able to utilise each
producers sparql endpoint for queries instead of sindice would be
valuable to everyone. Is the idea always going to be that people just
choose sites and discover sparql endpoints on this set of sites via
sitemaps via robots.txt?

For larger databases I don't see the data dump idea serving to be
useful to the entire semantic web but then again google made things
work outside of expectations...

Btw, the map says something about billions of pieces of information
but the index that is used says: "Now searching index V1 (around 7.02
million documents and counting). Also try index V0 (26.9 M) " How much
of a dump is actually utilised in the search index in order to
compress the billions down to millions this way.

Cheers,

Peter

2008/5/30 Giovanni Tummarello <giovanni.tummarello@deri.org>:
> Hi Peter,
>
> Sindice will not go and probe your sparql endpoint. The Sitemap
> directive for exposing sparql endpoints is therefore mostly to be used
> by clients or other forms of integration which do not involve indexing
> the entire RDF model (or models) you offer.
>
> Sindice might return you sparql endpoints i nthe future (why not), it
> will certainly use the dump files you might want to provide and slice
> this instead of crawling when a sitemap (and a dump!) is available.
> The end result is that you serve a singlefile (the dump) but you can
> find your resolvable URIs (URLs) returned as results when a matching
> query is answered. ( e.g.
> http://www.sindice.com/search?q=semantic+sitemaps&qt=keyword the first
> result (the talk) and the last (the paper) were indexed without
> crawling the site)
>
> hope this helps
>
> Giovanni
>
>
> On Fri, May 30, 2008 at 1:39 AM, Peter Ansell <ansell.peter@gmail.com> wrote:
>> Does sindice utilise the SPARQL related pieces in anyway for internal
>> processing? Does it understand or replicate the slicing mode?
>>
>> <sc:linkedDataPrefix
>> slicing="scbd">http://dblp.rkbexplorer.com/id/</sc:linkedDataPrefix>
>> <sc:sparqlEndpointLocation>http://dblp.rkbexplorer.com/sparql/</sc:sparqlEndpointLocation>
>>
>> If my understanding is correct, this is aimed at a search engine
>> mostly... so it should publish this information when it finds it in a
>> directory of sorts to be most useful. Does sindice republish this
>> information in some form to allow directory based access to the
>> different linked data endpoints/sites?
>>
>> Cheers,
>>
>> Peter
>>
>> 2008/5/30 Giovanni Tummarello <giovanni.tummarello@deri.org>:
>>>
>>> A validator in sindice is possible and has been discussed but the list
>>> of things to do is now quite scary :-)
>>>
>>> poor man validator: plese post us about yout sitemap here
>>> http://forum.sindice.com/index.php . Free report and quick indexing to
>>> those who do.
>>>
>>> Giovanni
>>>
>>>> Mind you, Giovanni says that a lot of sitemaps are broken, so they fix them
>>>> and cache the fixed ones for Sindice purposes :-)
>>>>
>>>>
>>>> On 30/05/2008 00:02, "Peter Ansell" <ansell.peter@gmail.com> wrote:
>>>>
>>>> ...
>>>>>>
>>>>>> Richard
>>>>>>
>>>>>> [1] http://sw.deri.org/2007/07/sitemapextension/
>>>>>
>>>>> That looks very usable to me. Has anyone used it for linked data? How
>>>>> do you discover these sitemaps as a linked data user, as opposed to
>>>>> sitemaps which are traditionally submitted to search engines for
>>>>> searching. In either case, it would be nice to have an RDF description
>>>>> submitted as part of a sitemap to a semantic search engine so it might
>>>>> be good to standardise that mechanism based around these ideas.
>>>>>
>>>>> Also, there is a reference in that document to N-Quad format, what is
>>>>> that exactly? [2] is a bit sparse on examples so it is hard to
>>>>> understand what is meant by the syntax.
>>>>>
>>>>> Also, is the slicing declaration attempting to make up for a deficit
>>>>> in the SPARQL protocol w.r.t. DESCRIBE? Why not utilise SELECT if you
>>>>> had an idea of what pieces of information you desire, although I guess
>>>>> the server is in the best position to recommend information to you
>>>>> with DESCRIBE queries. I think slicing mechanisms should be defined
>>>>> outside of that context, although the lack of progress with CBD [3] is
>>>>> a little worrying with respect to that bit.
>>>>>
>>>>> [2] http://sw.deri.org/2008/02/nx/
>>>>> [3] http://www.w3.org/Submission/CBD/
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Peter
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
Received on Friday, 30 May 2008 01:20:29 UTC