Re: Baetling away

Thanks a lot for the reply Christoph.

I wish I had noticed your work beforehand! Your EvoOntontology [1]  
seems to cover most of what we are doing. Luckily we have not spend a  
lot of time yet working on baetle, and in any case it really helps to  
roll one's own, as far as getting an understanding of the issues go.

So now we have two groups coming at the same issue from slightly  
different angles. This could help us hammer out issues that each of  
us might have overseen individually. Perhaps that is what is needed  
to help forge a successful standard? In prodedural software  
engineering it does, but I am not sure how this works in the  
relational world.

Here are some thought that come to mind as I am looking at BOM in  
more detail.

1. One namespace

   BOM uses one namespace. With Baetle I have opted to use other well  
known ontologies from the start: foaf, sioc, doap, and others. Which  
is better? No idea.

   + Working with other ontologies has the advantage of increasing  
recognition of people working in the field. People know the foaf  
ontology well, so they can use their knowledge here as elsewhere.
   - On the other hand it could make querying a SPARQL endpoint more  
verbose, as one has to declare all the extra namespaces.
   - Working with multiple ontologies means one is somewhat less in  
control of the whole domain, and one is a little at the mercy of the  
interest the authors of the other ontology may put in your work
   + Working with multiple ontologies distributes the workload

I notice that you have nicely separated the different topics in your  
ontology such as the revision control system and the bom ontologies.  
So at that level you do make use of namespaces.

2. Revision Control Systems

   Your ontology pretty much covers the classes I have named in my  
UML diagram as DocumentVersion, SoftwarePackage and Version.

    baetle:Version is very similar to your rcs:Revision class. The  
difference is that you think of revisions as being revisions of  
something, whereas I think of a Version as being closer to an  
interface that Documents among other things, can have. This is  
probably influenced by my work on AtomOwl [3].

   So for example you would have

   <> a rcs:File;
               rcs:hasRevison <> .

   The question here is, what is the revision going to point to? And  
what is the file? The obvious thing to do is to have it point to  
another rcs:File, as above, but you have made rcs:File, rcs:Revision  
and rcs:Releases disjoint classes I think. So would that force one to  
have a revision be a blank node?

   <> a rcs:File;
               rcs:hasRevison [ rcs:number 1;
                                rcs:isRevisionOf <> ].

This is creating more indirection than is worth in my opinion. (but  
see the next section for illumination)

Here is how baetle does it with a real NetBeans example.

@prefix as: < 

     a foaf:Document, baetle:Version;
     baetle:id as:org/netbeans/modules/apisupport/project/queries/;
     baetle:updated "2006-10-31T23:34:26+0000"^^xsd:dateTime;
     baetle:previous as:org/netbeans/modules/apisupport/project/ 

Arguably the id should be an xsd:anyURI as it is in AtomOwl. That  
would remove the need to work out what type of thing it should point  
to. If it is to point to a resource, it seems best to me that it  
point to a full history of the file, but that is something I have not  
yet determined.

The idea of giving files ids is getting to be very popular. In Sun  
for example we have a Content Management system that adds such an id  
URI into every document, allowing it thereby to be tracked as it  
moves from one project to another, or it's history within the same  

The nice thing about working this way, is that a file can become a  
versioned file, if it is changed, without introducing a new  
indirection. Versioning a file is just adding new relations from it  
to other files.

Revision may be a better word that Version though.

I like the idea of having a latest version relation, though we can  
get the same thing indirectly without introducing a temporally  
indexed relation by ordering on the updated time stamp.

The name Release is also better chosen than SoftwarePackage.

But again here you have made a Release exclusive of a File. Whereas  
it seems to me that a Release just is a file, a special type of file  
perhaps, but nevertheless just a file...


There is another way of looking at your Revision class, and this  
would be to think of it as more like a commit. In SubVersion commits  
are atomic, and so they affect the whole repository. Here indeed I  
would understand that Revisions are distinct from Files, since a  
Revision here can affect the whole state of the repository. I noticed  
this by seeing how your bom:Issue has a relation bom:isResolvedIn  
with a domain of Revision, which is similar to the relation between a  
Committing and an Issue in baetle.

Here again though I would think one could easily separate a :Version  
class as defined in baetle, and have a
repository state, that is versioned. Each repository state links to  
every file in the repository at that state, and the state has a  
previous state and a next state, etc...

Baetle Commits are not well thought out. I was not sure what the best  
way to model this was across repositories.  Perhaps you have thought  
more on this. rcs:Revision seems to better capture the abstraction.  
In CVS repositories it would take a little work to get all that  
information together.

Bug Ontology

There are a lot of different classes here, both in baetle as in bom.


As a matter of interest, did you forge your intuitions by working  
with bugzilla? That is what we have done, and your relations seem  
very similar to the ones we have developed. It would be useful to  
have some research comparing the different bug tracking systems [4].


One thing the baetle ontology gains from is using Tim Berners Lee's  
workflow ontology. This helps structure the relation between an Issue  
(which is a wf:Task), the wf:State it is in, and the wf:Actions that  
have to be taken to resolve them.


Working with sioc, we get all their ontology for relating  
communities. an Issue is a task, and is also a forum where people  
discuss the task. Exactly how one should relate these is not yet  
clear to me.


Well that is as much as I have time for now. You work is very  
interesting an compelling. I am not sure what one should do here?  
Should we use your ontology, should we continue to work on our own  
and take inspiration from yours, and evolve our own as we the needs  
dictate, or should we work together?

My main aim in the short term is to have a presentation for JavaOne  
and Jazoon in Zurich where I can link NetBeans to such a bug tracking  
SPARQL endpoint. This is why the baetle ontology is still very sketchy.

We should definitely meet in Zurich in June (Jazoon is 25th-28th), if  
you are there. Your code analysis tools should be really interesting  
to the people at NetBeans (still need to look at those)

Thanks again,



On 5 Apr 2007, at 06:45, Christoph Kiefer wrote:

> Dear Henry
> Your post is very intersting and exciting since it also touches our  
> own
> research. Please have a look at our activities in that field, I guess
> they say more than I should write here:
> And also important our publications:
> 1/ "Analyzing Software with iSPARQL" (under review)
> 2/ "Mining Software Repositories with iSPARQL and a Software Evolution
> Ontology."
> 3/ "OptARQ: A SPARQL Optimization Approach based on Triple Pattern
> Selectivity Estimation."
> 4/ "Detecting Similar Java Classes Using Tree Algorithms."
> The Baetle data, if available, would of course be very interesting to
> test our iSPARQL and OptARQ systems.
> Best regards
> Christoph
> Henry Story schrieb:
> >
> >
> > Hi,
> >
> > I am currently working on Baetle, the Bug and Enhancement Tracking
> > LanguagE over on google code [1].
> >
> > The project was started about a month ago, and advancing  
> speedily. We
> > have moved from a simple UML outline of a diagram to a sketch of an
> > rdf ontology . But much more importantly we are starting to test  
> this
> > with real data, to get people to play with the information, and  
> in an
> > iterative Agile programming way, help us improve the ontology with
> > real use cases, thereby allowing us to develop the use cases
> > themselves...
> >
> > In the last couple of weeks we have extracted over 5 million
> > relations from the NetBeans bug database using a D2RQ mapping, and 1
> > million relations from the CVS repository [2] We have put up a  
> > endpoint using Sesame 2.0beta2 inside of Sun, with the hope of
> > releasing the data as one large Ntriples file. Large organisations
> > being what they are, and as I am not a great organizational man, but
> > rather a coder and semantic web evangelist, this may take a little
> > more time to come out than it should. Be that as it may, it is great
> > fun to play with such a large database of facts.
> >
> > But the Semantic Web all by oneself is no fun. Having a SPARQL
> > endpoint for just the bugs in NetBeans is a great database
> > experiment, much easier to put together it is true because of the
> > clarity of rdf, but not yet quite a full fledged SemWeb experience.
> > Furthermore focusing on NetBeans is probably skewing our ontology
> > towards CVS repositories and Bugzilla like bug databases, so I am
> > calling on other open source software projects to join in and  
> open up
> > their bug databases and version control repositories to a SPARQL
> > endpoint so that we can all sing together to the Baetles. :-)
> >
> > The work is not that much and I am more than willing to help get
> > things going. My idea is that the best candidates initially for this
> > would be Semantic Web Open Source software projects. The first two
> > that came to mind were Jena and Sesame, as they have a large set of
> > code, have been very active, and as this would give them data to  
> help
> > test their own frameworks; so I am ccing them here. But the forum is
> > open to everyone of course.
> >
> > >From my experience opening up NetBeans I now know that:
> >
> > - extracting bugs from a database with D2RQ is very easy.
> > - extracting commit messages and source files is easy with tools
> > such as StatCVS [3]
> > - linking bugs to source code was very easy on the NetBeans
> > project because the developers there stuck to a very simple
> > convention to annotate their commits with the bug numbers they were
> > fixing in an easily parseable way.
> > - linking source code to the binaries they are built into should
> > also be easy going. (I am just about to embark on this) [4]. This
> > type of relation could I believe be the best way to abstract away  
> the
> > differences between version control systems, btw [4]
> >
> > With a few extra databases opened up we would be able to improve  
> both
> > the ontology as well as the use cases quite a bit, as well as
> > becoming a real World Wide Web Semantic Project.
> >
> > Please don't hesitate to contact me, join the list, ask questions,
> > contribute SQL dumps of your bug database, or information about how
> > to extract information from your repositories, ... Join early and be
> > famous :-)
> >
> > Henry Story
> >
> > [1] < 
> baetle/>
> > [2] see "first sparql endpoint" thread on the mailing list
> >
> > <>
> > c2244b838e84c4fc
> > [3] <>
> > [4] "Does one need to tag Source Code"?
> >
> > <>
> > 0bea606c0f7c5626
> >
> > Home page: <>
> > Sun Blog: < 
> bblfish/>
> > Foaf name:
> > <>
> -- 
> Christoph Kiefer
> Department of Informatics, University of Zurich

Received on Thursday, 5 April 2007 11:15:10 UTC