- From: David Dailey <david.dailey@sru.edu>
- Date: Tue, 08 May 2007 13:48:20 -0400
- To: <chasen@chasenlehara.com>,<connolly@w3.org>,<ian@hixie.ch>
- Cc: <cwilso@microsoft.com>,<hyatt@apple.com>,<doug.schepers@vectoreal.com>, <www-archive@w3.org>,bhopgood@brookes.ac.uk
Hi folks, A bit more digging has revealed the following. Robert Kosara (of UNCC) whose opinion on such matters I respect very much writes the following: "The <http://infoviz.pnl.gov/>InfoViz group at PNNL/NVAC has developed a tool called <http://in-spire.pnl.gov/>IN-SPIRE for visually analyzing large text corpora. It takes some training, but is very powerful. For emails, you would probably want to do some pre-processing to get rid of quoted text, signatures, and such, as that would fool the similarity metrics.... The cool thing about IN-SPIRE is that it's actually quite dumb, but that means you have a chance to know why it put certain documents close to each other" In-Spire's license agreement seems to be unclear regarding non profit organizations like I assume W3C is. Since PNNL is a branch of the US goverment I would have thought it was covered by section 105 of the statute http://www.law.cornell.edu/uscode/uscode17/usc_sec_17_00000105----000-.html , but their copyright statement would indicate that the copyright may in fact be held by its employees who are apparently not federal "A unique feature of Battelle's contract with DOE allows our staff to work for private industry." Unless someone knows otherwise, I suspect that trying to track down the permissions and price (it appears to be high for profit-making corporations) and then learning how to use the thing might take longer than anyone here has. Let me know if anyone would like me to look further into it. David
Received on Tuesday, 8 May 2007 17:48:43 UTC