Re: Algorithm evaluation on the complete LOD cloud? from Paul Houle on 2015-04-24 (public-lod@w3.org from April 2015)

From: Paul Houle <ontology2@gmail.com>
Date: Fri, 24 Apr 2015 10:39:31 -0400
To: Gannon Dick <gannon_dick@yahoo.com>
Cc: "public-lod@w3.org" <public-lod@w3.org>, SW-forum Web <semantic-web@w3.org>, Laurens Rietveld <laurens.rietveld@vu.nl>
Message-ID: <CAE__kdSPBTqiwog2ESqX4kEgVmNWj=7nHM4tvZ4-r1f_RRwV8g@mail.gmail.com>
Here is my take.

The "Complete LOD cloud" is a stand-in for "any large collection of poorly
organized RDF data."  If you believe that RDF is a good model for
representing other sorts of data, you could imagine that some big
organization like Citibank or the U.S. Military has a large number of
different divisions that have all sorts of data of various quality.  In
fact if I look at all the files I have on my SOHO network you could say the
same is true for individuals and small biz too.

Then the right question to ask is "What Methods would one use to
characterize such a data set with little prior knowledge?"

That is a carefully chosen phrase.  @Gannon rails against frequentism, and
there are a number of ways to reach a similar conclusion,  such as

* the "grounding problem" in classical semantics
* the fact that any useful or interesting semantic system has to do
something or other that is competitive with some way of doing something
that is better in some way (i.e. if you don't know where you are going you
are going to wind up nowhere)

Also I find the "no special hardware requirements" thing to be strange,
 probably because it ought to be defined in terms of "I have a machine with
these specific specifications".  For instance,  if you had a machine with
32GB of RAM (which is pretty affordable if you don't pay OEM prices) you
could load a billion triples into a triple store.  If your machine is a
hand-me-down laptop from a salesman who couldn't sell that has just 4GB of
RAM you are in a very different situation.

On Thu, Apr 23, 2015 at 1:14 PM, Gannon Dick <gannon_dick@yahoo.com> wrote:

> Hi Laurens,
>
> Ignore the hecklers, I know what you mean.
>
> Look at the two "solutions" to the German Tank Problem:
> http://en.wikipedia.org/wiki/German_tank_problem
>
> "The analyses illustrate the difference between frequentist inference and
> Bayesian inference.
> Estimating the population maximum based on a single sample yields
> divergent results, while the estimation based on multiple samples is an
> instructive practical estimation question whose answer is simple but not
> obvious."
>
> A complete LOD Cloud has "frequentist inference" labels, the LOD Cloud the
> hecklers want to build adds "Bayesian inference" (aka spam or spinning or
> semantic) labels.  So what's the right answer ?  The right answer is that
> the Bayesian inference folks want you to speak
> <predicate>German</predicate> like them and frequentist inference folks
> just want to count Tanks correctly.
>
> The frequentist-istas  are boring, with just a single answer
> (transformation) and they insist on spewing normative information all over
> the Universe.  No wonder semantic hipsters mock them.  Newton, Einstein,
> Fermi, Dirac, Feynman ... all losers ... not smart enough to make up their
> own labels for things.  Chaos and Informative data sets FOREVER!
>
> --Gannon
>
>
>
>
> --------------------------------------------
> On Thu, 4/23/15, Laurens Rietveld <laurens.rietveld@vu.nl> wrote:
>
>  Subject: Algorithm evaluation on the complete LOD cloud?
>  To: "public-lod@w3.org" <public-lod@w3.org>, "SW-forum Web" <
> semantic-web@w3.org>
>  Date: Thursday, April 23, 2015, 6:21 AM
>
>  Hi all,
>  I'm doing some research on evaluating
>  algorithms on the complete LOD cloud (via http://lodlaundromat.org),
>  and am looking for existing papers and algorithms to
>  evaluate
>  The criteria for such an algorithm
>  are:It should be open
>  sourceDomain independentNo dependency on
>  third data sources, such as query logs or a gold
>  standardNo particular hardware dependencies (e.g. a
>  cluster)The algorithm should take a dataset as
>  input, and produce results as
>  output Many thanks in advance for any
>  suggestionsBest, Laurens
>
>
>  --
>  VU University
>  AmsterdamFaculty of Exact
>  SciencesDepartment of Computer
>  ScienceDe Boelelaan 1081
>  A1081 HV
>  AmsterdamThe
>  Netherlandswww.laurensrietveld.nllaurens.rietveld@vu.nl
>  Visiting
>  address: De Boelelaan
>  1081Science Building Room
>  T312
>
>


-- 
Paul Houle

*Applying Schemas for Natural Language Processing, Distributed Systems,
Classification and Text Mining and Data Lakes*

(607) 539 6254    paul.houle on Skype   ontology2@gmail.com
https://legalentityidentifier.info/lei/lookup
<http://legalentityidentifier.info/lei/lookup>
Received on Friday, 24 April 2015 14:40:05 UTC