- From: Giovanni Tummarello <g.tummarello@gmail.com>
- Date: Mon, 06 Mar 2006 01:37:08 +0100
- To: Walter Henderson <walterhen2001@yahoo.com>
- CC: semantic-web@w3.org
I understand this is some kind of targeted spam posting, but being anyway curious i took the time to try the thing. My mini review: ExploreXY is a client side tool for finding correlations between 2 terms (textual strings, say "foo" and "bar"). Its a client side tool, and this makes sense since each correlation search is highly specialized and therefore no server is going to to this for you (as its very unlikely that it would reuse any previous result). While there are not many details spelled out on how it works, i believe it does a bunch of initial google queries (or similar) to find related pages, then uses NLP to extract triples in the form of A - verb1 - B B - verb2 - C ... Y - verb3 -Z to link A and Z if these are the words you seek. But.. not only really, a lot of other results are provided the logic of which is not clear immediately. The problem is obvious: the instability of the overall procedure (in control theory terms). In simple terms: NLP is amazingly error prone per se. , so each triple extracted will have a... say 60% probability of making no sense. If you multiply this by the length of the path you have a probability of paths being meaningless very close to 1. But more often than not are pretty hilarious ;-) I tried to correlate "uranium" and "electricity". A typical result is qualitatively as follows: uranium - is - interest http://www.webelements.com/webelements/elements/text/U/key.html interests - of - engineers http://ieee-virtual-museum.org/ engineers - learned - ways http://www.energyquest.ca.gov/story/chapter02.html way - is - charge http://www.sciencemadesimple.com/static.html charge quantity - of - electricity coulomb http://en.wikipedia.org/wiki/Electricity Other issues involve the time and resources taken while searching. Expect a rather simple search to take 10-15 minutes and make your machine fairly unusable due to probably bad programming practices (high priority threads? too much work for the garbage collector?). I wont go into more details as my impression is that it is of no use now. I was able to spot a few not entirely wrong correlations, but i feel this is more from the initial web site search than from understanding what these said (definitely nothing surprising here). A remark that i feel like doing however is that the version that i tried is said to be beta so who knows maybe some new trick is implemented dramatically raising the S/N ratio by the time the 1.0 is reached? There is undoubtly a great great amount of work in here.. and the idea i think is interesting. One can only hope that they keep financing the people who actually write and design the thing rather than whoever is doing spam like marketing (hopefully not the same person) :-) Giovanni Walter Henderson wrote: >Apologies for cross posting. I have been reading the >emails a colleague gets from this listserv. > >A company in Nevada has announced the release of a >product that seems to do most of what people have been >talking about on this list serv. > >It takes in unstructured data, recomposes it in new >answers and displays it in easy format. > >In other words, you don't need to be an expert to be >an expert. > >The purpose and goals of this BOF/list serv seems to >have been bypassed. > >www.explorexy.com > >W. Henderson > >__________________________________________________ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com > > > > >
Received on Monday, 6 March 2006 00:38:07 UTC