Re: LOD Cloud Cache Stats from Kingsley Idehen on 2011-04-06 (semantic-web@w3.org from April 2011)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Wed, 06 Apr 2011 07:52:34 -0400
To: glenn mcdonald <glenn@furia.com>
CC: "public-lod@w3.org" <public-lod@w3.org>, "semantic-web@w3.org" <semantic-web@w3.org>
Message-ID: <4D9C5402.8060405@openlinksw.com>
On 4/5/11 8:26 PM, glenn mcdonald wrote:
>
>     On the issue of Triple Counts, you can't make sense of Data if you
>     can't count it.
>
>
> And your public instance /can't/ count it, since all your non-trivial 
> queries time out.
Are you not able use the public instance for intelligent faceting across 
the massive datasets that it hosts? You seem to forget that we actually 
have Web Services and live demonstrations of how that's achieved.

The SPARQL endpoint is configured to not allow the whole world 
concurrently count to 21 Billion+. You can use OFFSET and LIMIT to 
cursor your way through the data. How do you think our cursor based 
browser pages work? You did see the examples I've posted right?

>
> Also, the point of Jeff Jonas' thing about counting is not producing 
> /some/ number, but producing /correct/ numbers.

First off, I've discuss a lot of these matters with Jeff, and guess 
what, correct numbers isn't the point at all. In short, Jeff even has 
presentations about the benefits of "bad data".  Jeff is referring to 
the value of aggregation, which I believe you understand.

> How many unique real-world things are represented by those billions of 
> triples? You have no idea. This is not a failing of Virtuoso or 
> SPARQL, but it's a terminal failing of dbpedia as a data set. And 
> until you can explain how anybody would create a "Global Linked Data 
> Space" that would actually /make sense /to query, it doesn't matter 
> much whether you or anybody else can query it.
>
>     Exhibit #1 -- how do we Find the proverbial needle in a haystack
>     via ad-hoc queries at Web Scale?
>
>
> Not sure what you mean by "exhibit" here. Your queries timeout, so 
> unless the needle happens to be in the first page of the haystack, 
> you're not going to find it.

No they don't and that's where we just will not connect. You've already 
seen our browser pages that do just that, and your next response will 
ultimately take us back to arguing about page aesthetics.


>
>     Exhibit #2 -- how do we leverage faceted exploration and
>     navigation of massive data sets at Web Scale?
>
>
> I thought I knew what "faceted exploration" meant, but your "facet" 
> example has nothing I recognize as a facet, so I'm not sure what your 
> claim is here.

What are you talking about? Using group aggregates to pivot data across 
various dimensions is about what?

>
>     Exhibit #3 -- how do we perform ad-hoc declarative queries (Join
>     and Aggregates variety) that used to be confined to a local
>     Oracle, SQL Server, DB2, Informix, MySQL etc.., at Web Scales esp.
>     if the Web is now a Global Linked Data Space?
>
>
> Again, it sounds like your effective answer is "we don't". At least 
> not if we actually care about the results, and we want them in some 
> reasonable amount of time. I'm actually fine with this answer, but I 
> think you're claiming you have a different answer.

I am saying: how do we perform large scale declarative queries across 
graphs rather than tables stored in a specific RDBMS engine at Web scale.
>
>     I've issued a challenge to all BigData players to show me a public
>     endpoint that allows me to perform any of the tasks above. Thus
>     far, the silence has been predictably deafening :-)
>
>
> I'm not sure which "BigData players" you're superciliously calling out 
> here (and certainly me and my project aren't among them), but I 
> suspect the "silence" is due to your challenge being both hard to 
> follow and wildly irrelevant to their concerns. They're not concerned 
> with public endpoints, they have very limited interest in ad-hoc-ness, 
> they certainly don't care about the sprawling mess of dbpedia, and 
> they can't tolerate queries that run for many seconds and still only 
> deliver partial results. You're not engaged in the same enterprise. 
> Or, more precisely, your tech and their tech may inhabit the same 
> category in some sense, but your public demos and their private 
> enterprise systems do not.

Instead of arguing, can you simply respond with a link to an example of 
an endpoint that provides access to a massive data corpus re. 
declarative queries.
>
>     Yes and No. As will all of these matter utility lies in the eyes
>     and fingers of the data beholder.
>
>
> Seems like this pattern keeps reliably repeating: you post some 
> dbpedia-based demo that, to you, demonstates some quality of Virtuoso 
> or some supposed virtue of Linked Data as a concept. Then somebody 
> actually bothers to look at the details of what you posted, and points 
> out some glaring lameness about it. Then you blame that lameness on 
> somebody else ("That's a question for the team at RPI :-)"), and 
> simultaneously insist on the subjectivity of all quality assessments. 
> I don't buy it. If you're going to include Hendler's 6.4 billion 
> CSV-cell triples in your 21-billion brag, then you have to stand up 
> for them and explain why they're valuable. If you're going to keep 
> holding up dbpedia as an example, you need to start showing some 
> actual uses of it. Show us a human use-case that it's actually good 
> for, which "union of all Attributes associated with Entities that are 
> associated with the pattern 'New York'" very much is not.

What can I say to you, you just won't accept the point.

RPI have published RDF datasets. We loaded them. That's it.

Why don't you stop arguing and start showing on an "Apples vs Apples" 
basis. If you have something that matches what I am trying to showcase 
-- to a much broader audience than you -- just post a link.  Your 
arguments are missing too many points, sorry but that's how I see this 
repetitive cycle.


Simple examples queries that you can perform against the LOD Cloud Cache 
[1] that leverage faceted navigation and scrollable cursors:

1. Find all Entities associated with the Pattern: "New York" -- from the 
results page use Types or other Attribute filters to seek your 
particular disambiguated needle in this massive haystack

2. Repeat the above with owl:sameAs inference context enabled

3. Repeat the above with owl:sameAs + a fuzzy InverseFunctional property 
rule e.g. using foaf:name

4. Repeat 1-4 for anything that takes your interest where the thing in 
question is associated with a Text Pattern

5. Ditto using the Label lookup feature that lets you type in a Text 
Pattern which is then used to scope initial lookup to Entity Labels

6. Ditto using an actual Entity URI via the URI lookup tab.

When you see the "Retry" button it means: we have more records, so click 
if the filtered resultset doesn't currently show items of interest. 
Iterate .

Some links from an earlier post to the lotico mailing list re. the 
examples above (modulo inference context):

Links (some very basic demos that don't include exploitation of 
inference rules to this massive DBMS instance) :

1. http://lod.openlinksw.com/fct/facet.vsp?cmd=load&fsq_id=29 -- 
distinct count of Entities associated with pattern "New York"

2. http://lod.openlinksw.com/fct/facet.vsp?cmd=load&fsq_id=30 -- types 
of Entities associated with the pattern "New York"

3. http://lod.openlinksw.com/fct/facet.vsp?cmd=load&fsq_id=31 -- union 
of all Attributes associated with Entities that are associated with the 
pattern "New York"

4. http://lod.openlinksw.com/fct/facet.vsp?cmd=load&fsq_id=32 -- facet 
based on Entities of type: Historic Places, associated with pattern "New 
York"

5. http://lod.openlinksw.com/fct/facet.vsp?cmd=load&fsq_id=33 -- simple 
Meshup (one-click on "Places") showing locations associated with 
Entities of type: Historic Places, associated with pattern "New York"

6. 
http://lod.openlinksw.com/describe/?url=http%3A//dbpedia.org/resource/23rd_Street_%2528Manhattan%2529 
- description of 23rd street, Manhattan

7. 
http://lod.openlinksw.com/describe/?url=http%3A%2F%2Fdbpedia.org%2Fresource%2FCategory%3AStreets_in_Manhattan 
-- pages of information (via scrollable cursors) about streets of 
Manhattan .


Links:

1. lod.openlinksw.com
2. 
http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtFacetBrowserInstallConfig 
- Faceted browser and navigation engine guide
>
> glenn
> -- 
> You received this message because you are subscribed to the Google 
> Groups "Lotico Semantic Web - Lab" group.
> To post to this group, send email to lotico-list@googlegroups.com.
> To unsubscribe from this group, send email to 
> lotico-list+unsubscribe@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/lotico-list?hl=en.




-- 

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web:http://www.openlinksw.com
Weblog:http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen







-- 

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Received on Wednesday, 6 April 2011 11:53:00 UTC