- From: adasal <adam.saltiel@gmail.com>
- Date: Mon, 6 Mar 2006 15:02:45 +0000
- To: semantic-web@w3.org
- Message-ID: <e8aa138c0603060702j7881cdb9h@mail.gmail.com>
I'm not convinced this was spam. I think it was a fit of enthusiasm, but only Walter knows! Adam On 06/03/06, Giovanni Tummarello <g.tummarello@gmail.com> wrote: > > > I understand this is some kind of targeted spam posting, but being > anyway curious i took the time to try the thing. > > My mini review: > > ExploreXY is a client side tool for finding correlations between 2 terms > (textual strings, say "foo" and "bar"). Its a client side tool, and this > makes sense since each correlation search is highly specialized and > therefore no server is going to to this for you (as its very unlikely > that it would reuse any previous result). While there are not many > details spelled out on how it works, i believe it does a bunch of > initial google queries (or similar) to find related pages, then uses NLP > to extract triples in the form of > > A - verb1 - B > B - verb2 - C > ... > Y - verb3 -Z > > to link A and Z if these are the words you seek. But.. not only really, > a lot of other results are provided the logic of which is not clear > immediately. > > The problem is obvious: the instability of the overall procedure (in > control theory terms). In simple terms: NLP is amazingly error prone > per se. , so each triple extracted will have a... say 60% probability of > making no sense. > If you multiply this by the length of the path you have a probability of > paths being meaningless very close to 1. > But more often than not are pretty hilarious ;-) > I tried to correlate "uranium" and "electricity". A typical result is > qualitatively as follows: > > uranium - is - interest > http://www.webelements.com/webelements/elements/text/U/key.html > interests - of - engineers http://ieee-virtual-museum.org/ > engineers - learned - ways > http://www.energyquest.ca.gov/story/chapter02.html > way - is - charge > http://www.sciencemadesimple.com/static.html > charge quantity - of - electricity coulomb > http://en.wikipedia.org/wiki/Electricity > > Other issues involve the time and resources taken while searching. > Expect a rather simple search to take 10-15 minutes and make your > machine fairly unusable due to probably bad programming practices (high > priority threads? too much work for the garbage collector?). > I wont go into more details as my impression is that it is of no use > now. I was able to spot a few not entirely wrong correlations, but i > feel this is more from the initial web site search than from > understanding what these said (definitely nothing surprising here). > > A remark that i feel like doing however is that the version that i tried > is said to be beta so who knows maybe some new trick is implemented > dramatically raising the S/N ratio by the time the 1.0 is reached? > There is undoubtly a great great amount of work in here.. and the idea i > think is interesting. One can only hope that they keep financing the > people who actually write and design the thing rather than whoever is > doing spam like marketing (hopefully not the same person) :-) > Giovanni > > > Walter Henderson wrote: > > >Apologies for cross posting. I have been reading the > >emails a colleague gets from this listserv. > > > >A company in Nevada has announced the release of a > >product that seems to do most of what people have been > >talking about on this list serv. > > > >It takes in unstructured data, recomposes it in new > >answers and displays it in easy format. > > > >In other words, you don't need to be an expert to be > >an expert. > > > >The purpose and goals of this BOF/list serv seems to > >have been bypassed. > > > >www.explorexy.com > > > >W. Henderson > > > >__________________________________________________ > >Do You Yahoo!? > >Tired of spam? Yahoo! Mail has the best spam protection around > >http://mail.yahoo.com > > > > > > > > > > > > >
Received on Monday, 6 March 2006 15:02:55 UTC