[ML] Re: Log Term In Entropy from Jon Awbrey on 2001-01-25 (www-rdf-logic@w3.org from January 2001)

From: Jon Awbrey <jawbrey@oakland.edu>
Date: Thu, 25 Jan 2001 09:24:01 -0500
To: Joshi Mukul Madhukar <mukul@cse.iitb.ernet.in>
CC: machine-learning@egroups.com, Arisbe <arisbe@stderr.org>, RDF Logic <www-rdf-logic@w3.org>, SemioCom <semiocom@listbot.com>
Message-ID: <3A703701.C68FCFE2@oakland.edu>

Joshi, Mukul Madhukar wrote:
> 
> Hi,
>
> What is the intution or reason behind using the log term in Entropy
> used in decesion trees?  From a quick thought ... it gives a mapping
> to a value between 0 and 1.  Any deep thought??
> Thanks
>
> ~ Mukul
> 
> Seek simplicity and Distrust it.
>
> --------------------------------------------------------------
> Mukul Madhukar Joshi.
> MTech Student,
> Computer Science Department.
> Room No. 69, Hostel No. 1
> Indian Institute Of Technology, Powai.
> --------------------------------------------------------------

¤~~~~~~~~~¤~~~~~~~~~¤~~~~~~~~~¤~~~~~~~~~¤~~~~~~~~~¤

Mukul,

Taking logarithms is merely for the convenience of converting
(what is probably the more natural) multiplicative measure of
diversity or of variety to an additive measure, and it is the
taking of averages over the appropriate denominator that gets
the range of the entropy measure back to the interval [0, 1].

The history of our ideas about information in relationship to
notions of entropy or uncertainty is really quite fascinating.
Most people are unaware that C.S. Peirce was lecturing on the
subject that he called the "Theory of Information" at Harvard
as early as 1865.  He initially distilled his earliest theory
from a matrix (raw material) of purely logical considerations,
if you count semiotic (the theory of signs) under the heading
of logic, and he frequently employed a multiplicative measure
of "multiplicity" as the simplest way to quantify uncertainty
with respect to a multitude or a variety of choices.  Because
many of these multiplicities were generated or represented as
the counting of functions, say, of the form {f : X -> Y}, and
since the number of functions in this type of "function space"
is given by |Y|^|X|, where |S| = Card(S) = "Cardinality of S",
also, since it is a quite common occurrence in simple problem
settings to work over the same basis |Y| for extended periods
of time, taking the various and sundry pieces of "information"
that arise as affording "constraints" on how many options one
has to consider, it was rather natural to detach the exponent,
namely, the proportionate fraction of |X|, that characterized
the set of possibilities that one currently had to worry over
in deciding the answer to a question or the action to realize.
This is tantamount to taking logarithmic images on a base |Y|.

I hope this explanation is not too simple to distrust entirely!

As it it happens, I was just discussing this very same subject
in one of my other e-fora, so I will forward you a copy of how
I put it there, under a "FYSMI" (Funny You Should Mention It!)
subject line cover.

Thanks For The Very Interesting Question!
May You Have Many Happy Gedankencounters!

Looking Forward Tuit,

Jon Awbrey

¤~~~~~~~~~¤~~~~~~~~~¤~~~~~~~~~¤~~~~~~~~~¤~~~~~~~~~¤

Received on Thursday, 25 January 2001 09:23:57 UTC