Baetling away


I am currently working on Baetle, the Bug and Enhancement Tracking  
LanguagE over on google code [1].

The project was started about a month ago, and advancing speedily. We  
have moved from a simple UML outline of a diagram to a sketch of an  
rdf ontology . But much more importantly we are starting to test this  
with real data, to get people to play with the information, and in an  
iterative Agile programming way, help us improve the ontology with  
real use cases, thereby allowing us to develop the use cases  

In the last couple of weeks we have extracted over 5 million  
relations from the NetBeans bug database using a D2RQ mapping, and 1  
million relations from the CVS repository [2] We have put up a SPARQL  
endpoint using Sesame 2.0beta2 inside of Sun, with the hope of  
releasing the data as one large Ntriples file. Large organisations  
being what they are, and as I am not a great organizational man, but  
rather a coder and semantic web evangelist, this may take a little  
more time to come out than it should. Be that as it may, it is great  
fun to play with such a large database of facts.

But the Semantic Web all by oneself is no fun. Having a SPARQL  
endpoint for just the bugs in NetBeans is a great database  
experiment, much easier to put together it is true because of the  
clarity of rdf, but not yet quite a full fledged SemWeb experience.  
Furthermore focusing on NetBeans is probably skewing our ontology  
towards CVS repositories and Bugzilla like bug databases, so I  am  
calling on other open source software projects to join in and open up  
their bug databases and version control repositories to a SPARQL  
endpoint so that we can all sing together to the Baetles. :-)

The work is not that much and I am more than willing to help get  
things going. My idea is that the best candidates initially for this  
would be Semantic Web Open Source software projects. The first two  
that came to mind were Jena and Sesame, as they have a large set of  
code, have been very active, and as this would give them data to help  
test their own frameworks; so I am ccing them here.  But the forum is  
open to everyone of course.

 From my experience opening up NetBeans I now know that:

   - extracting bugs from a database with D2RQ is very easy.
   - extracting commit messages and source files is easy with tools  
such as StatCVS [3]
   - linking bugs to source code was very easy on the NetBeans  
project because the developers there stuck to a very simple  
convention to annotate their commits with the bug numbers they were  
fixing in an easily parseable way.
   - linking source code to the binaries they are built into should  
also be easy going. (I am just about to embark on this) [4]. This  
type of relation could I believe be the best way to abstract away the  
differences between version control systems, btw [4]

With a few extra databases opened up we would be able to improve both  
the ontology as well as the use cases quite a bit, as well as  
becoming a real World Wide Web Semantic Project.

Please don't hesitate to contact me, join the list, ask questions,  
contribute SQL dumps of your bug database, or information about how  
to extract information from your repositories, ... Join early and be  
famous :-)

Henry Story

[2] see "first sparql endpoint" thread on the mailing list 
[4] "Does one need to tag Source Code"? 

Home page:
Sun Blog:
Foaf name:

Home page:
Sun Blog:
Foaf name:

Received on Wednesday, 4 April 2007 21:18:28 UTC