Re: Semantic Web Search from Giovanni Tummarello on 2008-06-22 (public-lod@w3.org from June 2008)

From: Giovanni Tummarello <giovanni.tummarello@deri.org>
Date: Sun, 22 Jun 2008 22:32:06 +0100
To: "Hugh Glaser" <hg@ecs.soton.ac.uk>
Cc: "public-lod@w3.org" <public-lod@w3.org>, "Sindice general discussions list" <sindice-general@lists.deri.org>
Message-ID: <210271540806221432s79b0abb4l8a9cf50d7bd1cb0@mail.gmail.com>

Hi Hugh,

as far as Sindice is concerned,please  just post your message on
http://forum.sindice.com and we'll be able to follow your data case
closely.

as far as large datasets are concerned, the indexing is currently
"manual" that is we must personally know of the dataset (e.g. from a
post in the above forum) and we insert it. Otherwise if the dataset is
linked from the outside the crawler might eventually pick it up, but
that's pretty suboptimal.

As far as Sindice, i have not yet announced the beta1 on the mailing
lists becouse we're still finishing a few things, among which indexing
some datasets. Unfortunately we only have 12 cores in the indexing
cluster and that limits us to less than 2 million documents per day
(reasoning is performed on each one independently) So might take a bit

*the pings* however.. should really work right away.. if they dont, a
message on the above forum will get your attention right away.

Last thing, the entry with 18946779 might be left out becouse we have
a hard limit. I am not sure if we should have such hard limit, maybe
its unneeded, we could be discussing this.

Giovanni

On Sun, Jun 22, 2008 at 7:09 PM, Hugh Glaser <hg@ecs.soton.ac.uk> wrote:
>
> Just how good are the Search services - are they as good as they claim?
>
> I've been worrying about this for quite a while now, running round the
> engines I know about, submitting URIs, etc.. I usually get assured that
> things will be really good soon, but I'm not sure how much things are
> improving. Of course some are better than others, but every now and then I
> see documents that do analysis of the Semantic Web that appear to be based
> on the need to find RDF documents.
> Here is an example (taken from the stuff we publish, of course - sorry, but
> the other dblp links are not much better).
>
> http://dblp.rkbexplorer.com/id/conf/otm/JaffriGM07
> Or
> http://dblp.rkbexplorer.com/data/conf/otm/JaffriGM07
>
> This is a URI that is available off a web page
> ( http://dblp.rkbexplorer.com/ ), and from
> http://dblp.L3S.de/d2r/resource/publications/conf/otm/JaffriGM07
> owl:sameAs), that is an entry to 18946779 triples and 5725785 symbols.
> None of it has changed for a few months.
>
> It seems to me that a Linked Data site of this sort should be quite obvious
> to people querying search engines, and that something must be wrong if it is
> not.
> Even more worrying I find, is that if I know of 50+ million triples lying
> around in Linked Data sites built from a few thousand rdf files, what else
> is out there?
>
> Sorry if this sounds a little negative, but it is because I think that being
> able to find things is so important that I ask the question, and the related
> questions:
> Am I doing something wrong?
> Do we (Linked Data) need to something else (apart from sitemap, void, etc.)?
> In fact, what are the claims?
>
> I would be delighted to find I am getting this all wrong, and driving the
> engines wrong, or missing some of them or...
>
> Best
> Hugh
>
>
>

Received on Sunday, 22 June 2008 21:32:43 UTC