- From: Henry Story <henry.story@bblfish.net>
- Date: Wed, 4 Apr 2007 23:18:27 +0200
- To: Semantic Web <semantic-web@w3.org>
- Cc: jena-dev@yahoogroups.com, Sesame discussion list Developer <sesame-devel@lists.sourceforge.net>
Hi, I am currently working on Baetle, the Bug and Enhancement Tracking LanguagE over on google code [1]. The project was started about a month ago, and advancing speedily. We have moved from a simple UML outline of a diagram to a sketch of an rdf ontology . But much more importantly we are starting to test this with real data, to get people to play with the information, and in an iterative Agile programming way, help us improve the ontology with real use cases, thereby allowing us to develop the use cases themselves... In the last couple of weeks we have extracted over 5 million relations from the NetBeans bug database using a D2RQ mapping, and 1 million relations from the CVS repository [2] We have put up a SPARQL endpoint using Sesame 2.0beta2 inside of Sun, with the hope of releasing the data as one large Ntriples file. Large organisations being what they are, and as I am not a great organizational man, but rather a coder and semantic web evangelist, this may take a little more time to come out than it should. Be that as it may, it is great fun to play with such a large database of facts. But the Semantic Web all by oneself is no fun. Having a SPARQL endpoint for just the bugs in NetBeans is a great database experiment, much easier to put together it is true because of the clarity of rdf, but not yet quite a full fledged SemWeb experience. Furthermore focusing on NetBeans is probably skewing our ontology towards CVS repositories and Bugzilla like bug databases, so I am calling on other open source software projects to join in and open up their bug databases and version control repositories to a SPARQL endpoint so that we can all sing together to the Baetles. :-) The work is not that much and I am more than willing to help get things going. My idea is that the best candidates initially for this would be Semantic Web Open Source software projects. The first two that came to mind were Jena and Sesame, as they have a large set of code, have been very active, and as this would give them data to help test their own frameworks; so I am ccing them here. But the forum is open to everyone of course. From my experience opening up NetBeans I now know that: - extracting bugs from a database with D2RQ is very easy. - extracting commit messages and source files is easy with tools such as StatCVS [3] - linking bugs to source code was very easy on the NetBeans project because the developers there stuck to a very simple convention to annotate their commits with the bug numbers they were fixing in an easily parseable way. - linking source code to the binaries they are built into should also be easy going. (I am just about to embark on this) [4]. This type of relation could I believe be the best way to abstract away the differences between version control systems, btw [4] With a few extra databases opened up we would be able to improve both the ontology as well as the use cases quite a bit, as well as becoming a real World Wide Web Semantic Project. Please don't hesitate to contact me, join the list, ask questions, contribute SQL dumps of your bug database, or information about how to extract information from your repositories, ... Join early and be famous :-) Henry Story [1] http://code.google.com/p/baetle/ [2] see "first sparql endpoint" thread on the mailing list http://groups.google.com/group/baetle/browse_thread/thread/ c2244b838e84c4fc [3] http://statcvs.sf.net/ [4] "Does one need to tag Source Code"? http://groups.google.com/group/baetle/browse_thread/thread/ 0bea606c0f7c5626 Home page: http://bblfish.net/ Sun Blog: http://blogs.sun.com/bblfish/ Foaf name: http://bblfish.net/people/henry/card#me Home page: http://bblfish.net/ Sun Blog: http://blogs.sun.com/bblfish/ Foaf name: http://bblfish.net/people/henry/card#me
Received on Wednesday, 4 April 2007 21:18:28 UTC