Re: Survey: issue tracking, summarization, and clustering from David Dailey on 2007-05-04 (www-archive@w3.org from May 2007)

From: David Dailey <david.dailey@sru.edu>
Date: Fri, 04 May 2007 12:58:41 -0400
To: <chasen@chasenlehara.com>,<connolly@w3.org>,<ian@hixie.ch>
Cc: <cwilso@microsoft.com>,<hyatt@apple.com>,<doug.schepers@vectoreal.com>, <www-archive@w3.org>,bhopgood@brookes.ac.uk
Message-Id: <6.2.5.6.1.20070504110622.01ea9da8@sru.edu>

+bhopgood@brookes.ac.uk

I was going to try to get some email out yesterday on this, but it's
the last week of classes and students seem to be about in the
hallways poking in and asking questions and the like.

I'm including Bob Hopgood who writes that he is indeed interested,
though may not have much time to contribute. Neither he, Dave Hyatt,
Chris, nor Doug Schepers actually signed up for this duty. So I
suppose I should ask if anyone objects to including folks who have
not formally checked the box in the survey (
<http://lists.w3.org/Archives/Public/public-html/2007Apr/1399.html>http://lists.w3.org/Archives/Public/public-html/2007Apr/1399.html
).

As I mentioned in my previous note,
http://lists.w3.org/Archives/Public/www-archive/2007Apr/0075.html I
don't have an email address for David McClure. Also I note that my
last email to Chasen Le Hara seems to have bounced complaining that
the mailbox was full.

Dan and I chatted a bit and Hixie has mentioned his interests which
are most modest it would seem:
http://lists.w3.org/Archives/Public/www-archive/2007May/0011.html
(what those who weren't involved from the context provided here might
not realize from the quote therein is that the "provocation" referred
to was all quite fun and that no ill feelings were had by any party,
just in case anyone should worry otherwise. I can report favorably
that Dave Hyatt does indeed have a sense of humor.)

Let me summarize a couple of the approaches and some reasoning
associated with them and express my hope that someone here has some
expertise that I do not.

1. My original idea was a two-stage process. The first stage would be
"issue identification, summarizaton and clustering" the second would
be issue tracking. In this approach, we would view the 4000 some odd
email messages as a form of public testimony to which a
content-analysis would be applied in a sort of social science kind of
way, through some sort of computer assisted process using lots of
human hands plowing through and clustering topics into categories.
The categories would then be dealt with in a more traditional
issue-tracking (help-desk, bug-reporting, or project management)
system. That is, the identification of distinct issues would be done
through a large scale manual effort (with software helping in the
chore). The primary motivation here is that no one's issue gets
"lost" and that the process by which it is handled demonstrates
accountability in the "sunshine" sense of the word. How many of us
have submitted a problem somewhere and watched as it seemed to
disappear from sight, never to resurface? On the other hand, how
could all those hands be trained so as to provide consistency? That
could be a very large problem, though if that work were to waiver,
the editors should be able to undo it with minimal cost to the
editorial process.

This could be seen as a sort of a large scale "customer needs
analysis". The 4000+ email messages represent the customer's
statement of interest. One uses some process to digest that into
customer "specifications" and thereafter extract "issues".

I think Dan's reaction to that was that it was likely to be too
labor-intensive and too slow; that we need something sooner and not
quite so elaborate as what I was envisioning. I could see the
numerous human hands part as helping the editors as they sift through
the stuff, but Dan's alternative idea has pretty much persuaded me otherwise.

2. An alternative that I think Dan suggests would work something like
this: assuming the WHATWG proposal is accepted as a starting place
today, then discussion of that would begin, on a sort of
issue-by-issue basis. This obviates the need for any fancy content
analysis (more on this topic later) since the issues are identified
already by the organization of that existing document. Thereafter it
is just a matter of following the discussion forward from each
already identified issue. It preserves the integrity of the 4000
messages generated since March 7th in the sense that WG members would
provide links to whatever opinions they have already stated on the
subject (hopefully with use cases and examples and so forth). Those
links between issues (as framed by the WHATWG document) and
discussions (existing or new) would be manually provided by those who
have an interest in providing those links.

#2 makes sense to me as well.

Concerning content analysis: the bit of probing I've done suggests
the field hasn't changed a whole lot since my last real look into it
in 1984. The existing software does not seem to be able to aid much
in the identification, processing and sorting of large quantities of
topics ( in which there might be a many to many relationship between
topics and email messages). Unless you all have anything really
whizzy to recommend, I don't think we would want to try to build
something and I rather doubt that the tool I imagine being useful
exists right now.

Hence that puts us into a more conventional realm. I can see a
variety of approaches that might fit the bill. All are subject to
W3C's requirements about where the software and data can be housed. I
gather or infer that open source solutions are preferrable.

a. Help desk software. Again the last time I dealt with this was a
long time ago. We used Vaxnotes. It was painful. My friends at SRU
tell me they're using a proprietary product and are not particularly
happy with it. I suppose you all know a lot more about this that I do
and could comment a lot better on what there is and whether or not it
could be molded into the requirments of the chairs, the editors, and
the WG membership.

b. Bug- tracking software. W3C has used Bug-zilla in some places. I
wondered if it would scale up; Dan who knows more about it expressed
the concern that it might not scale down. The word on the streets is
that it is probably too cumbersome for our needs.

c. Software project management software for tracking progress on
already identified issues. Trac has been mentioned
http://trac.edgewall.org/. Dan mentions Ping Roundup
(http://zesty.ca/roundup.html) . I also have heard people recommend
Tracker http://www.docu-track.com/ , but I believe it is proprietary.

I have not used any of these approaches and I can report that my
quick search for tools that might preserve my sense of electronic
democracy through some sort of group-mediated issue identification
has come up empty handed.

So all this having been said. I think that about all I will be able
to help with is maybe in getting some discussions started. I have no
concrete recommendations whatever, other than this nagging hope that
the populace will "feel" properly represented by the process; that
the W3C will be able to address all the issues it needs to, and that
the editors will be aided rather than hindered by whatever software
or process is used.

All this being said, it may be that whatever Hixie is already doing
may in fact already be optimal and that if there is a way for that to
be housed at W3C, then that's the unique solution to the problem.

Cheers,
David

Received on Friday, 4 May 2007 16:58:27 UTC