Re: this product seems to do most of what people talk about - mini review

I understand this is some kind of  targeted spam posting, but being 
anyway curious i took the time to try the thing.

My mini review:

ExploreXY is a client side tool for finding correlations between 2 terms 
(textual strings, say "foo" and "bar"). Its a client side tool, and this 
makes sense since each correlation search is highly specialized and 
therefore no server is going to to this for you (as its very unlikely 
that it would reuse any previous result). While there are not many 
details spelled out on how it works, i believe it does a bunch of 
initial google queries (or similar) to find related pages, then uses NLP 
to extract triples in the form of

A - verb1 - B
B - verb2 - C
...
Y - verb3 -Z

to link A and Z if these are the words you seek. But.. not only really,  
a lot of other results are provided the logic of which is not clear 
immediately.

The problem is obvious: the instability of the overall procedure (in 
control theory terms). In simple terms:  NLP is amazingly error prone 
per se. , so each triple extracted will have a... say 60% probability of 
making no sense.
If you multiply this by the length of the path you have a probability of 
paths being meaningless very close to 1.
But more often than not are pretty hilarious ;-)
I tried to correlate "uranium" and "electricity". A typical result is 
qualitatively as follows:

uranium - is - interest                    
http://www.webelements.com/webelements/elements/text/U/key.html
interests - of - engineers               http://ieee-virtual-museum.org/
engineers - learned - ways            
http://www.energyquest.ca.gov/story/chapter02.html
way - is - charge                           
http://www.sciencemadesimple.com/static.html
charge quantity - of - electricity coulomb    
http://en.wikipedia.org/wiki/Electricity

Other issues involve the time and resources taken while searching. 
Expect a rather simple search to take 10-15 minutes and make your 
machine fairly unusable due to probably bad programming practices (high 
priority threads? too much work for the garbage collector?).
I wont go into more details as my impression is that it is of no use 
now. I was able to spot a few not entirely wrong correlations, but i 
feel this is more from the initial web site search than from 
understanding what these said (definitely nothing surprising here).

A remark that i feel like doing however is that the version that i tried 
is said to be beta so who knows maybe some new trick is implemented 
dramatically raising the S/N ratio by the time the 1.0 is reached? 
There is undoubtly a great great amount of work in here.. and the idea i 
think is interesting. One can only hope that they keep financing the 
people who actually write and design the thing rather than whoever is 
doing spam like marketing (hopefully not the same person) :-)
Giovanni


Walter Henderson wrote:

>Apologies for cross posting. I have been reading the
>emails a colleague gets from this listserv.
>
>A company in Nevada has announced the release of a
>product that seems to do most of what people have been
>talking about on this list serv.
>
>It takes in unstructured data, recomposes it in new
>answers and displays it in easy format.
>
>In other words, you don't need to be an expert to be
>an expert.
>
>The purpose and goals of this BOF/list serv seems to
>have been bypassed.
>
>www.explorexy.com
>
>W. Henderson
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>http://mail.yahoo.com 
>
>
>
>  
>

Received on Monday, 6 March 2006 00:38:07 UTC