Re: Survey: issue tracking, summarization, and clustering from David Dailey on 2007-05-08 (www-archive@w3.org from May 2007)

From: David Dailey <david.dailey@sru.edu>
Date: Tue, 08 May 2007 13:48:20 -0400
To: <chasen@chasenlehara.com>,<connolly@w3.org>,<ian@hixie.ch>
Cc: <cwilso@microsoft.com>,<hyatt@apple.com>,<doug.schepers@vectoreal.com>, <www-archive@w3.org>,bhopgood@brookes.ac.uk
Message-Id: <6.2.5.6.1.20070508132100.03855c88@sru.edu>

Hi folks,

A bit more digging has revealed the following. Robert Kosara (of 
UNCC) whose opinion on such matters I respect very much writes the following:

"The <http://infoviz.pnl.gov/>InfoViz group at PNNL/NVAC has 
developed a tool called <http://in-spire.pnl.gov/>IN-SPIRE for 
visually analyzing large text corpora.
It takes some training, but is very powerful. For emails, you would 
probably want to do some pre-processing to get rid of quoted text, 
signatures, and such, as that would fool the similarity metrics.... 
The cool thing about IN-SPIRE is that it's actually quite dumb, but 
that means you have a chance to know why it put certain documents 
close to each other"

In-Spire's license agreement seems to be unclear regarding non profit 
organizations like I assume W3C is. Since PNNL is a branch of the US 
goverment I would have thought it was covered by section 105 of the 
statute 
http://www.law.cornell.edu/uscode/uscode17/usc_sec_17_00000105----000-.html 
, but their copyright statement would indicate that the copyright may 
in fact be held by its employees who are apparently not federal "A 
unique feature of Battelle's contract with DOE allows our staff to 
work for private industry."

Unless someone knows otherwise, I suspect that trying to track down 
the permissions and price (it appears to be high for profit-making 
corporations) and then learning how to use the thing might take 
longer than anyone here has. Let me know if anyone would like me to 
look further into it.

David

Received on Tuesday, 8 May 2007 17:48:43 UTC