Re: LOD Cloud Cache Stats from Kingsley Idehen on 2011-04-06 (public-lod@w3.org from April 2011)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Wed, 06 Apr 2011 13:20:06 -0400
To: glenn mcdonald <glenn@furia.com>
CC: "public-lod@w3.org" <public-lod@w3.org>, "semantic-web@w3.org" <semantic-web@w3.org>
Message-ID: <4D9CA0C6.306@openlinksw.com>
On 4/6/11 1:05 PM, glenn mcdonald wrote:
>
>     I am demonstrating and talking about what Virtuoso infrastructure
>     enables.
>
>
> You're talking about it, and you're *trying* to demonstrate it. But 
> your demonstrations are consistently undermined by other factors you 
> consider irrelevant.

To you.

You != my target audience, clearly.

My target audience is interested in DBMS scalability with regards to RDF 
data ingestion, indexing, and publication. You, as far as I can gather 
are more interested in idealism such as:

1. Perfect Data
2. Perfect Visual Aesthetics.

Hence, as I said earlier, your comments are skewed by context infidelity.

I think this might be your first post series to the LOD mailing list, 
and you make a quantum leap re. assumptions about what I am 
demonstrating or why I released the stats spreadsheets.

This isn't a newbies oriented mailing list. There is a lot of context 
already in place re. my comments.

>
>>     Nonsense. See
>>     http://jeffjonas.typepad.com/IRAHSS_Expert_Counting.pdf,
>     Again, I've discussed these matters with Jeff and this is not
>     about perfect numbers.
>
>
> "Perfect" is hardly the issue here. I encourage people to read the paper.

Yes, and most of the folks you refer to on these mailing lists (esp. 
Semantic Web segment) already know the entire paper is about the stuff 
OWL handles very well. The historic challenge has been all about how you 
actually demonstrate the prowess of OWL against a massive Linked Data 
Corpus.

Transitivity, Inference Context, InverseFunctional Properties, 
owl:sameAs etc.. all understood by the audience here (at least in the 
majority).

>
>>     which summarizes itself like this: "This article suggests that
>>     the single most fundamental capability required to make a
>>     sensemaking system is the system’s ability to recognise when
>>     multiple references to the same entity (often from different
>>     source systems) are in fact the same entity." dbpedia as a
>>     dataset fails this test badly.
>     And how on earth does that have anything to do with Counting?
>
>
> I encourage *you* to read the paper, too.

See my comments above.

As I told you, I've already long done a "show and tell" session with 
Jeff re. the prowess of OWL at the kind of scale our instance offers. If 
you do know Jeff ask him this question: how did your session with 
Kingsley go re. "sense making at massive scales" based on his instance 
at: lod.openlinksw.com .

>
>>>         Not sure what you mean by "exhibit" here. Your queries
>>>         timeout, so unless the needle happens to be in the first
>>>         page of the haystack, you're not going to find it.
>>
>>         No they don't and that's where we just will not connect.
>>         You've already seen our browser pages that do just that, and
>>         your next response will ultimately take us back to arguing
>>         about page aesthetics.
>>
>>
>>     Sorry, I can't follow this response. By "no they don't" do you
>>     mean that your queries /don't/ timeout? They certainly do when I
>>     try them.
>
>     You can actually issue SPARQL with timeouts. Do you not remember
>     the conversation about partial aggregates in ad-hoc queries using
>     SPARQL or SQL? That's what I am talking about. What we call
>     "Anytime Query" [1] as a critical technique for ad-hoc queries at
>     infinite scale [2].
>
>
> And as I've said before, this is an impressive technical 
> accomplishment. But it's not always helpful for people. If I have to 
> page through 20 straws of hay at a time, that's not what I mean by 
> searching the haystack.

If you have a massive amount of data in a query result set, the solution 
is to present the results in a page that is driven by a Scrollable 
Cursor. Depending on the kind of DBMS engine at hand and the model to 
which the query is posed, the Cursor can take any of the following forms:

1. Static / Snapshot
2. Keyset
3. Dynamic
4. Mixed .

The above are old items from 30+ years of DBMS tech. not our invention, 
just stuff we've learned across our own many years of experience and 
applied to the emerging realm of Relational Property Graph based Linked 
Data.


>
>     The folks on this mailing list understand why RPIs dataset is
>     loaded to the LOD cloud and what it means.
>
>
> Perhaps so. Somebody want to describe the practical use they're making 
> of this data?

Maybe, but that's for a different thread and totally different 
conversation. There are many folks interested in data quality matters. 
Thus, you will find company, and that includes me. Just understand that 
I started this thread with a specific purpose in mind that factored in 
the target audience. One size doesn't fit all.




-- 

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Wednesday, 6 April 2011 17:22:15 UTC