Re: Survey: issue tracking, summarization, and clustering

+bhopgood@brookes.ac.uk

I was going to try to get some email out yesterday on this, but it's 
the last week of classes and students seem to be about in the 
hallways poking in and asking questions and the like.

I'm including Bob Hopgood who writes that he is indeed interested, 
though may not have much time to contribute. Neither he, Dave Hyatt, 
Chris, nor Doug Schepers actually signed up for this duty. So I 
suppose I should ask if anyone objects to including folks who have 
not formally checked the box in the survey (
<http://lists.w3.org/Archives/Public/public-html/2007Apr/1399.html>http://lists.w3.org/Archives/Public/public-html/2007Apr/1399.html
).

As I mentioned in my previous note, 
http://lists.w3.org/Archives/Public/www-archive/2007Apr/0075.html I 
don't have an email address for David McClure. Also I note that my 
last email to Chasen Le Hara seems to have bounced complaining that 
the mailbox was full.

Dan and I chatted a bit and Hixie has mentioned his interests which 
are most modest it would seem:
http://lists.w3.org/Archives/Public/www-archive/2007May/0011.html 
(what those who weren't involved from the context provided here might 
not realize from the quote therein is that the "provocation" referred 
to was all quite fun and that no ill feelings were had by any party, 
just in case anyone should worry otherwise. I can report favorably 
that Dave Hyatt does indeed have a sense of humor.)

Let me summarize a couple of the approaches and some reasoning 
associated with them and express my hope that someone here has some 
expertise that I do not.

1. My original idea was a two-stage process. The first stage would be 
"issue identification, summarizaton and clustering" the second would 
be issue tracking. In this approach, we would view the 4000 some odd 
email messages as a form of public testimony to which a 
content-analysis would be applied in a sort of social science kind of 
way, through some sort of computer assisted process using lots of 
human hands plowing through and clustering topics into categories. 
The categories would then be dealt with in a more traditional 
issue-tracking (help-desk, bug-reporting, or project management) 
system. That is, the identification of distinct issues would be done 
through a large scale manual effort (with software helping in the 
chore). The primary motivation here is that no one's issue gets 
"lost" and that the process by which it is handled demonstrates 
accountability in the "sunshine" sense of the word. How many of us 
have submitted a problem somewhere and watched as it seemed to 
disappear from sight, never to resurface? On the other hand, how 
could all those hands be trained so as to provide consistency? That 
could be a very large problem, though if that work were to waiver, 
the editors should be able to undo it with minimal cost to the 
editorial process.

This could be seen as a sort of a large scale "customer needs 
analysis". The 4000+ email messages represent the customer's 
statement of interest. One uses some process to digest that into 
customer "specifications" and thereafter extract "issues".

I think Dan's reaction to that was that it was likely to be too 
labor-intensive and too slow; that we need something sooner and not 
quite so elaborate as what I was envisioning. I could see the 
numerous human hands part as helping the editors as they sift through 
the stuff, but Dan's alternative idea has pretty much persuaded me otherwise.

2. An alternative that I think Dan suggests would work something like 
this: assuming the WHATWG proposal is accepted as a starting place 
today, then discussion of that would begin, on a sort of 
issue-by-issue basis. This obviates the need for any fancy content 
analysis (more on this topic later) since the issues are identified 
already by the organization of that existing document. Thereafter it 
is just a matter of following the discussion forward from each 
already identified issue. It preserves the integrity of the 4000 
messages generated since March 7th in the sense that WG members would 
provide links to whatever opinions they have already stated on the 
subject (hopefully with use cases and examples and so forth). Those 
links between issues (as framed by the WHATWG document) and 
discussions (existing or new) would be manually provided by those who 
have an interest in providing those links.

#2 makes sense to me as well.

Concerning content analysis: the bit of probing I've done suggests 
the field hasn't changed a whole lot since my last real look into it 
in 1984. The existing software does not seem to be able to aid much 
in the identification, processing and sorting of large quantities of 
topics ( in which there might be a many to many relationship between 
topics and email messages). Unless you all have anything really 
whizzy to recommend, I don't think we would want to try to build 
something and I rather doubt that the tool I imagine being useful 
exists right now.

Hence that puts us into a more conventional realm. I can see a 
variety of approaches that might fit the bill. All are subject to 
W3C's requirements about where the software and data can be housed. I 
gather or infer that open source solutions are preferrable.

a. Help desk software. Again the last time I dealt with this was a 
long time ago. We used Vaxnotes. It was painful. My friends at SRU 
tell me they're using a proprietary product and are not particularly 
happy with it. I suppose you all know a lot more about this that I do 
and could comment a lot better on what there is and whether or not it 
could be molded into the requirments of the chairs, the editors, and 
the WG membership.

b. Bug- tracking software. W3C has used Bug-zilla in some places. I 
wondered if it would scale up; Dan who knows more about it expressed 
the concern that it might not scale down. The word on the streets is 
that it is probably too cumbersome for our needs.

c. Software project management software for tracking progress on 
already identified issues. Trac has been mentioned 
http://trac.edgewall.org/. Dan mentions Ping Roundup 
(http://zesty.ca/roundup.html) . I also have heard people recommend 
Tracker http://www.docu-track.com/ , but I believe it is proprietary.

I have not used any of these approaches and I can report that my 
quick search for tools that might preserve my sense of electronic 
democracy through some sort of group-mediated issue identification 
has come up empty handed.

So all this having been said. I think that about all I will be able 
to help with is maybe in getting some discussions started. I have no 
concrete recommendations whatever, other than this nagging hope that 
the populace will "feel" properly represented by the process; that 
the W3C will be able to address all the issues it needs to, and that 
the editors will be aided rather than hindered by whatever software 
or process is used.

All this being said, it may be that whatever Hixie is already doing 
may in fact already be optimal and that if there is a way for that to 
be housed at W3C, then that's the unique solution to the problem.

Cheers,
David

Received on Friday, 4 May 2007 16:58:27 UTC