Re: LOD Cloud Cache Stats from Kingsley Idehen on 2011-04-06 (semantic-web@w3.org from April 2011)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Wed, 06 Apr 2011 10:55:43 -0400
To: glenn mcdonald <glenn@furia.com>
CC: "public-lod@w3.org" <public-lod@w3.org>, "semantic-web@w3.org" <semantic-web@w3.org>
Message-ID: <4D9C7EEF.8070505@openlinksw.com>
On 4/6/11 10:27 AM, glenn mcdonald wrote:
>
>     Are you not able use the public instance for intelligent faceting
>     across the massive datasets that it hosts?
>
>
> I think it's fair to say "Yes, I am not able to use the public 
> instance for anything I would consider 'intelligent faceting' of the 
> dbpedia dataset".

You can, but you refuse to see how.

> As I said before, I don't mean this as a criticism of the technical 
> infrastructure of Virtuoso.

I am demonstrating and talking about what Virtuoso infrastructure enables.

> But I do mean it as a criticism of dbpedia as a specific dataset, of 
> your data-explorer UI, of the granularity of RDF and the 
> expressiveness of SPARQL, and of the premise of trying to build one 
> big unified database of all the world's data.
>
>     First off, I've discuss a lot of these matters with Jeff, and
>     guess what, correct numbers isn't the point at all.
>
>
> Nonsense. See http://jeffjonas.typepad.com/IRAHSS_Expert_Counting.pdf,

Again, I've discussed these matters with Jeff and this is not about 
perfect numbers.

> which summarizes itself like this: "This article suggests that the 
> single most fundamental capability required to make a sensemaking 
> system is the system’s ability to recognise when multiple references 
> to the same entity (often from different source systems) are in fact 
> the same entity." dbpedia as a dataset fails this test badly.

And how on earth does that have anything to do with Counting?

That's a comments about how you figure out that one or more Identifiers 
share a common Referent.

Again, I actually not only talked to Jeff about this, I've actually 
demonstrated the wonderment of OWL to him via this particular instance 
in relation to these matters.

The only difference here is that Jeff doesn't approach these matters via 
OWL and RDF. Naturally, since he started some of his work pre. Semantic Web.

>
>>     Not sure what you mean by "exhibit" here. Your queries timeout,
>>     so unless the needle happens to be in the first page of the
>>     haystack, you're not going to find it.
>
>     No they don't and that's where we just will not connect. You've
>     already seen our browser pages that do just that, and your next
>     response will ultimately take us back to arguing about page
>     aesthetics.
>
>
> Sorry, I can't follow this response. By "no they don't" do you mean 
> that your queries /don't/ timeout? They certainly do when I try them.

You can actually issue SPARQL with timeouts. Do you not remember the 
conversation about partial aggregates in ad-hoc queries using SPARQL or 
SQL? That's what I am talking about. What we call "Anytime Query" [1] as 
a critical technique for ad-hoc queries at infinite scale [2].

>
>>         Exhibit #2 -- how do we leverage faceted exploration and
>>         navigation of massive data sets at Web Scale?
>>
>>
>>     I thought I knew what "faceted exploration" meant, but your
>>     "facet" example has nothing I recognize as a facet, so I'm not
>>     sure what your claim is here.
>     What are you talking about? Using group aggregates to pivot data
>     across various dimensions is about what?
>
>
> The link you said demonstrated faceting was this:
>
> http://lod.openlinksw.com/fct/facet.vsp?sid=35044&cmd=refresh 
> <http://lod.openlinksw.com/fct/facet.vsp?sid=35044&cmd=refresh>
>
> In this view I see a filter in effect on dbpedia-owl:HistoricPlace, 
> but "facet" usually means a little parallel list of counts along some 
> dimension, not just a filter. E.g., the "narrow your results" sidebar 
> here:
>
> http://www.bestbuy.com/site/Digital-Cameras/Digital-SLR-Cameras/abcat0401005.c?id=abcat0401005
>
>     Instead of arguing, can you simply respond with a link to an
>     example of an endpoint that provides access to a massive data
>     corpus re. declarative queries.
>
>
> See, this response just makes it seem like you didn't read my note for 
> comprehension.
>
>     What can I say to you, you just won't accept the point.
>
>     RPI have published RDF datasets. We loaded them. That's it.
>
>
> Exactly. You loaded an artificially bloated dataset and then bragged 
> about its size.

I loaded a bloated dataset and bragged about the size. Hmm. Anyway, you 
are 100% percent entitled to your opinion. I am not going to expend any 
energy on your opinions as you refuse to work within any kind of context.

The folks on this mailing list understand why RPIs dataset is loaded to 
the LOD cloud and what it means. I can't burn any cycles explaining that 
to you, your comments are 100% self explanatory re. context infidelity.


>
>     Simple examples queries that you can perform against the LOD Cloud
>     Cache [1] that leverage faceted navigation and scrollable cursors:
>
>     1. Find all Entities associated with the Pattern: "New York" --
>     from the results page use Types or other Attribute filters to seek
>     your particular disambiguated needle in this massive haystack
>
>
> I go to the link you provided (lod.openlinksw.com 
> <http://lod.openlinksw.com>), I type "New York" in the box and hit 
> Enter. The query times out with no results.

"Retry" means: you have a partial result set, if you can't pivot from 
here, try again. DBMS basics for result handling when data is 
partitioned horizontally. Our tweak is the support for partial 
aggregates in this context.

>
>     2. Repeat the above with owl:sameAs inference context enabled
>     3. Repeat the above with owl:sameAs + a fuzzy InverseFunctional
>     property rule e.g. using foaf:name
>
>
> Moot given that #1 produces no results, but since you already said 
> that you have inferrencing turned off on this public instance, how did 
> you expect me to do these?

Hit the "Retry" button, take a deep breadth, then try to fix your 
context infidelity problem :-)




-- 

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Wednesday, 6 April 2011 14:56:08 UTC