- From: Paul Houle <ontology2@gmail.com>
- Date: Fri, 24 Apr 2015 10:39:31 -0400
- To: Gannon Dick <gannon_dick@yahoo.com>
- Cc: "public-lod@w3.org" <public-lod@w3.org>, SW-forum Web <semantic-web@w3.org>, Laurens Rietveld <laurens.rietveld@vu.nl>
- Message-ID: <CAE__kdSPBTqiwog2ESqX4kEgVmNWj=7nHM4tvZ4-r1f_RRwV8g@mail.gmail.com>
Here is my take. The "Complete LOD cloud" is a stand-in for "any large collection of poorly organized RDF data." If you believe that RDF is a good model for representing other sorts of data, you could imagine that some big organization like Citibank or the U.S. Military has a large number of different divisions that have all sorts of data of various quality. In fact if I look at all the files I have on my SOHO network you could say the same is true for individuals and small biz too. Then the right question to ask is "What Methods would one use to characterize such a data set with little prior knowledge?" That is a carefully chosen phrase. @Gannon rails against frequentism, and there are a number of ways to reach a similar conclusion, such as * the "grounding problem" in classical semantics * the fact that any useful or interesting semantic system has to do something or other that is competitive with some way of doing something that is better in some way (i.e. if you don't know where you are going you are going to wind up nowhere) Also I find the "no special hardware requirements" thing to be strange, probably because it ought to be defined in terms of "I have a machine with these specific specifications". For instance, if you had a machine with 32GB of RAM (which is pretty affordable if you don't pay OEM prices) you could load a billion triples into a triple store. If your machine is a hand-me-down laptop from a salesman who couldn't sell that has just 4GB of RAM you are in a very different situation. On Thu, Apr 23, 2015 at 1:14 PM, Gannon Dick <gannon_dick@yahoo.com> wrote: > Hi Laurens, > > Ignore the hecklers, I know what you mean. > > Look at the two "solutions" to the German Tank Problem: > http://en.wikipedia.org/wiki/German_tank_problem > > "The analyses illustrate the difference between frequentist inference and > Bayesian inference. > Estimating the population maximum based on a single sample yields > divergent results, while the estimation based on multiple samples is an > instructive practical estimation question whose answer is simple but not > obvious." > > A complete LOD Cloud has "frequentist inference" labels, the LOD Cloud the > hecklers want to build adds "Bayesian inference" (aka spam or spinning or > semantic) labels. So what's the right answer ? The right answer is that > the Bayesian inference folks want you to speak > <predicate>German</predicate> like them and frequentist inference folks > just want to count Tanks correctly. > > The frequentist-istas are boring, with just a single answer > (transformation) and they insist on spewing normative information all over > the Universe. No wonder semantic hipsters mock them. Newton, Einstein, > Fermi, Dirac, Feynman ... all losers ... not smart enough to make up their > own labels for things. Chaos and Informative data sets FOREVER! > > --Gannon > > > > > -------------------------------------------- > On Thu, 4/23/15, Laurens Rietveld <laurens.rietveld@vu.nl> wrote: > > Subject: Algorithm evaluation on the complete LOD cloud? > To: "public-lod@w3.org" <public-lod@w3.org>, "SW-forum Web" < > semantic-web@w3.org> > Date: Thursday, April 23, 2015, 6:21 AM > > Hi all, > I'm doing some research on evaluating > algorithms on the complete LOD cloud (via http://lodlaundromat.org), > and am looking for existing papers and algorithms to > evaluate > The criteria for such an algorithm > are:It should be open > sourceDomain independentNo dependency on > third data sources, such as query logs or a gold > standardNo particular hardware dependencies (e.g. a > cluster)The algorithm should take a dataset as > input, and produce results as > output Many thanks in advance for any > suggestionsBest, Laurens > > > -- > VU University > AmsterdamFaculty of Exact > SciencesDepartment of Computer > ScienceDe Boelelaan 1081 > A1081 HV > AmsterdamThe > Netherlandswww.laurensrietveld.nllaurens.rietveld@vu.nl > Visiting > address: De Boelelaan > 1081Science Building Room > T312 > > -- Paul Houle *Applying Schemas for Natural Language Processing, Distributed Systems, Classification and Text Mining and Data Lakes* (607) 539 6254 paul.houle on Skype ontology2@gmail.com https://legalentityidentifier.info/lei/lookup <http://legalentityidentifier.info/lei/lookup>
Received on Friday, 24 April 2015 14:40:05 UTC