W3C home > Mailing lists > Public > semantic-web@w3.org > April 2011

Re: LOD Cloud Cache Stats

From: glenn mcdonald <glenn@furia.com>
Date: Tue, 5 Apr 2011 20:26:01 -0400
Message-ID: <BANLkTimU=NUB5VqcLkWi2my1KCCivKPbxA@mail.gmail.com>
To: lotico-list@googlegroups.com
Cc: Kingsley Idehen <kidehen@openlinksw.com>, "public-lod@w3.org" <public-lod@w3.org>, "semantic-web@w3.org" <semantic-web@w3.org>
>
> On the issue of Triple Counts, you can't make sense of Data if you can't
> count it.


And your public instance *can't* count it, since all your non-trivial
queries time out.

Also, the point of Jeff Jonas' thing about counting is not producing
*some* number,
but producing *correct* numbers. How many unique real-world things are
represented by those billions of triples? You have no idea. This is not a
failing of Virtuoso or SPARQL, but it's a terminal failing of dbpedia as a
data set. And until you can explain how anybody would create a "Global
Linked Data Space" that would actually *make sense *to query, it doesn't
matter much whether you or anybody else can query it.

Exhibit #1 -- how do we Find the proverbial needle in a haystack via ad-hoc
> queries at Web Scale?
>

Not sure what you mean by "exhibit" here. Your queries timeout, so unless
the needle happens to be in the first page of the haystack, you're not going
to find it.

Exhibit #2 -- how do we leverage faceted exploration and navigation of
> massive data sets at Web Scale?
>

I thought I knew what "faceted exploration" meant, but your "facet" example
has nothing I recognize as a facet, so I'm not sure what your claim is here.

Exhibit #3 -- how do we perform ad-hoc declarative queries (Join and
> Aggregates variety) that used to be confined to a local Oracle, SQL Server,
> DB2, Informix, MySQL etc.., at Web Scales esp. if the Web is now a Global
> Linked Data Space?
>

Again, it sounds like your effective answer is "we don't". At least not if
we actually care about the results, and we want them in some reasonable
amount of time. I'm actually fine with this answer, but I think you're
claiming you have a different answer.

I've issued a challenge to all BigData players to show me a public endpoint
> that allows me to perform any of the tasks above. Thus far, the silence has
> been predictably deafening :-)
>

I'm not sure which "BigData players" you're superciliously calling out here
(and certainly me and my project aren't among them), but I suspect the
"silence" is due to your challenge being both hard to follow and wildly
irrelevant to their concerns. They're not concerned with public endpoints,
they have very limited interest in ad-hoc-ness, they certainly don't care
about the sprawling mess of dbpedia, and they can't tolerate queries that
run for many seconds and still only deliver partial results. You're not
engaged in the same enterprise. Or, more precisely, your tech and their tech
may inhabit the same category in some sense, but your public demos and their
private enterprise systems do not.

Yes and No. As will all of these matter utility lies in the eyes and fingers
> of the data beholder.


Seems like this pattern keeps reliably repeating: you post some
dbpedia-based demo that, to you, demonstates some quality of Virtuoso or
some supposed virtue of Linked Data as a concept. Then somebody actually
bothers to look at the details of what you posted, and points out some
glaring lameness about it. Then you blame that lameness on somebody
else ("That's
a question for the team at RPI :-)"), and simultaneously insist on the
subjectivity of all quality assessments. I don't buy it. If you're going to
include Hendler's 6.4 billion CSV-cell triples in your 21-billion brag, then
you have to stand up for them and explain why they're valuable. If you're
going to keep holding up dbpedia as an example, you need to start showing
some actual uses of it. Show us a human use-case that it's actually good
for, which "union of all Attributes associated with Entities that are
associated with the pattern 'New York'" very much is not.

glenn
Received on Thursday, 7 April 2011 01:30:40 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 21:45:42 GMT