Re: Algorithm evaluation on the complete LOD cloud? from Gannon Dick on 2015-04-24 (semantic-web@w3.org from April 2015)

From: Gannon Dick <gannon_dick@yahoo.com>
Date: Fri, 24 Apr 2015 09:23:07 -0700
To: Paul Houle <ontology2@gmail.com>
Cc: "public-lod@w3.org" <public-lod@w3.org>, SW-forum Web <semantic-web@w3.org>, Laurens Rietveld <laurens.rietveld@vu.nl>
Message-ID: <1429892587.73850.YahooMailBasic@web122905.mail.ne1.yahoo.com>
@Gannon here.

Apologies Paul, my sarcasm went a bit over the top.

If only new "creation" of list labels (data definitions) is considered, then there is only one choice of structure for a  "any large collection of *well* organized RDF data."

<rdf:list>
   <rdf:first>Sum partial fractions e.g. a Ground State</rdf:first>

   <rdf:rest>re-normalization group fraction</rdf:rest>
   <rdf:rest>re-normalization group fraction</rdf:rest>
   <rdf:rest>re-normalization group fraction</rdf:rest>
...
   <rdf:nil />
 </rdf:list>

Semantic data does not need ground state change (Bayesian inference) to be useful.  Inflation as homage to the "Open World Assumption" does much harm to insight.  No need to subject the dynamics to continuous compounding (change of Radix in LOG Space); because it is already there.

--Gannon 
--------------------------------------------
On Fri, 4/24/15, Paul Houle <ontology2@gmail.com> wrote:

 Subject: Re: Algorithm evaluation on the complete LOD cloud?
 To: "Gannon Dick" <gannon_dick@yahoo.com>
 Cc: "public-lod@w3.org" <public-lod@w3.org>, "SW-forum Web" <semantic-web@w3.org>, "Laurens Rietveld" <laurens.rietveld@vu.nl>
 Date: Friday, April 24, 2015, 9:39 AM
 
 Here is my
 take.
 The "Complete LOD
 cloud" is a stand-in for "any large collection of
 poorly organized RDF data."  If you believe that RDF
 is a good model for representing other sorts of data, you
 could imagine that some big organization like Citibank or
 the U.S. Military has a large number of different divisions
 that have all sorts of data of various quality.  In fact if
 I look at all the files I have on my SOHO network you could
 say the same is true for individuals and small biz
 too.
 Then the right
 question to ask is "What Methods would one use to
 characterize such a data set with little prior
 knowledge?"
 That
 is a carefully chosen phrase.  @Gannon rails against
 frequentism, and there are a number of ways to reach a
 similar conclusion,  such as
 * the "grounding problem"
 in classical semantics* the fact that any useful
 or interesting semantic system has to do something or other
 that is competitive with some way of doing something that is
 better in some way (i.e. if you don't know where you are
 going you are going to wind up nowhere)
 Also I find the "no special
 hardware requirements" thing to be strange,  probably
 because it ought to be defined in terms of "I have a
 machine with these specific specifications".  For
 instance,  if you had a machine with 32GB of RAM (which is
 pretty affordable if you don't pay OEM prices) you could
 load a billion triples into a triple store.  If your
 machine is a hand-me-down laptop from a salesman who
 couldn't sell that has just 4GB of RAM you are in a very
 different situation.
 On Thu, Apr 23, 2015 at
 1:14 PM, Gannon Dick <gannon_dick@yahoo.com>
 wrote:
 Hi
 Laurens,
 
 
 
 Ignore the hecklers, I know what you mean.
 
 
 
 Look at the two "solutions" to the German Tank
 Problem: http://en.wikipedia.org/wiki/German_tank_problem
 
 
 
 "The analyses illustrate the difference between
 frequentist inference and Bayesian inference.
 
 Estimating the population maximum based on a single sample
 yields divergent results, while the estimation based on
 multiple samples is an instructive practical estimation
 question whose answer is simple but not obvious."
 
 
 
 A complete LOD Cloud has "frequentist inference"
 labels, the LOD Cloud the hecklers want to build adds
 "Bayesian inference" (aka spam or spinning or
 semantic) labels.  So what's the right answer ?  The
 right answer is that the Bayesian inference folks want you
 to speak <predicate>German</predicate> like them
 and frequentist inference folks just want to count Tanks
 correctly.
 
 
 
 The frequentist-istas  are boring, with just a single
 answer (transformation) and they insist on spewing normative
 information all over the Universe.  No wonder semantic
 hipsters mock them.  Newton, Einstein, Fermi, Dirac,
 Feynman ... all losers ... not smart enough to make up their
 own labels for things.  Chaos and Informative data sets
 FOREVER!
 
 
 
 --Gannon
 
 
 
 
 
 
 
 
 
 --------------------------------------------
 
 On Thu, 4/23/15, Laurens Rietveld <laurens.rietveld@vu.nl>
 wrote:
 
 
 
  Subject: Algorithm evaluation on the complete LOD
 cloud?
 
  To: "public-lod@w3.org"
 <public-lod@w3.org>,
 "SW-forum Web" <semantic-web@w3.org>
 
  Date: Thursday, April 23, 2015, 6:21 AM
 
 
 
  Hi all,
 
  I'm doing some research on evaluating
 
  algorithms on the complete LOD cloud (via http://lodlaundromat.org),
 
  and am looking for existing papers and algorithms to
 
  evaluate
 
  The criteria for such an algorithm
 
  are:It should be open
 
  sourceDomain independentNo dependency on
 
  third data sources, such as query logs or a gold
 
  standardNo particular hardware dependencies (e.g. a
 
  cluster)The algorithm should take a dataset as
 
  input, and produce results as
 
  output Many thanks in advance for any
 
  suggestionsBest, Laurens
 
 
 
 
 
  --
 
  VU University
 
  AmsterdamFaculty of Exact
 
  SciencesDepartment of Computer
 
  ScienceDe Boelelaan 1081
 
  A1081 HV
 
  AmsterdamThe
 
  Netherlandswww.laurensrietveld.nllaurens.rietveld@vu.nl
 
  Visiting
 
  address: De Boelelaan
 
  1081Science Building Room
 
  T312
 
 
 
 
 
 
 -- 
 Paul Houle
 
 Applying Schemas for Natural
 Language Processing, Distributed Systems, Classification and
 Text Mining and Data Lakes
 (607) 539 6254    paul.houle on
 Skype   ontology2@gmail.comhttps://legalentityidentifier.info/lei/lookup
Received on Friday, 24 April 2015 16:23:35 UTC