Re: LOD Cloud Cache Stats from glenn mcdonald on 2011-04-06 (public-lod@w3.org from April 2011)

From: glenn mcdonald <glenn@furia.com>
Date: Wed, 6 Apr 2011 10:27:58 -0400
To: Kingsley Idehen <kidehen@openlinksw.com>
Cc: "public-lod@w3.org" <public-lod@w3.org>, "semantic-web@w3.org" <semantic-web@w3.org>
Message-ID: <BANLkTikMVA+y70yVjgfUqGj2p_VWf1VztQ@mail.gmail.com>
>
> Are you not able use the public instance for intelligent faceting across
> the massive datasets that it hosts?
>

I think it's fair to say "Yes, I am not able to use the public instance for
anything I would consider 'intelligent faceting' of the dbpedia dataset". As
I said before, I don't mean this as a criticism of the technical
infrastructure of Virtuoso. But I do mean it as a criticism of dbpedia as a
specific dataset, of your data-explorer UI, of the granularity of RDF and
the expressiveness of SPARQL, and of the premise of trying to build one big
unified database of all the world's data.

First off, I've discuss a lot of these matters with Jeff, and guess what,
> correct numbers isn't the point at all.
>

Nonsense. See http://jeffjonas.typepad.com/IRAHSS_Expert_Counting.pdf, which
summarizes itself like this: "This article suggests that the single most
fundamental capability required to make a sensemaking system is the system’s
ability to recognise when multiple references to the same entity (often from
different source systems) are in fact the same entity." dbpedia as a dataset
fails this test badly.

Not sure what you mean by "exhibit" here. Your queries timeout, so unless
> the needle happens to be in the first page of the haystack, you're not going
> to find it.
>
>
> No they don't and that's where we just will not connect. You've already
> seen our browser pages that do just that, and your next response will
> ultimately take us back to arguing about page aesthetics.
>

Sorry, I can't follow this response. By "no they don't" do you mean that
your queries *don't* timeout? They certainly do when I try them.

Exhibit #2 -- how do we leverage faceted exploration and navigation of
>> massive data sets at Web Scale?
>>
>
>  I thought I knew what "faceted exploration" meant, but your "facet"
> example has nothing I recognize as a facet, so I'm not sure what your claim
> is here.
>
> What are you talking about? Using group aggregates to pivot data across
> various dimensions is about what?
>

The link you said demonstrated faceting was this:

http://lod.openlinksw.com/fct/facet.vsp?sid=35044&cmd=refresh

In this view I see a filter in effect on dbpedia-owl:HistoricPlace, but
"facet" usually means a little parallel list of counts along some dimension,
not just a filter. E.g., the "narrow your results" sidebar here:

http://www.bestbuy.com/site/Digital-Cameras/Digital-SLR-Cameras/abcat0401005.c?id=abcat0401005

Instead of arguing, can you simply respond with a link to an example of an
> endpoint that provides access to a massive data corpus re. declarative
> queries.
>

See, this response just makes it seem like you didn't read my note for
comprehension.

What can I say to you, you just won't accept the point.
>
> RPI have published RDF datasets. We loaded them. That's it.
>

Exactly. You loaded an artificially bloated dataset and then bragged about
its size.

Simple examples queries that you can perform against the LOD Cloud Cache [1]
> that leverage faceted navigation and scrollable cursors:
>
> 1. Find all Entities associated with the Pattern: "New York" -- from the
> results page use Types or other Attribute filters to seek your particular
> disambiguated needle in this massive haystack
>

I go to the link you provided (lod.openlinksw.com), I type "New York" in the
box and hit Enter. The query times out with no results.

2. Repeat the above with owl:sameAs inference context enabled
> 3. Repeat the above with owl:sameAs + a fuzzy InverseFunctional property
> rule e.g. using foaf:name
>

Moot given that #1 produces no results, but since you already said that you
have inferrencing turned off on this public instance, how did you expect me
to do these?
Received on Thursday, 7 April 2011 01:30:40 UTC