- From: Henry Story <henry.story@bblfish.net>
- Date: Thu, 5 Apr 2007 12:00:54 +0200
- To: jena-dev@yahoogroups.com, Christoph Kiefer <kiefer@ifi.unizh.ch>
- Cc: baetle@googlegroups.com
Thanks a lot for the reply Christoph. I wish I had noticed your work beforehand! Your EvoOntontology [1] seems to cover most of what we are doing. Luckily we have not spend a lot of time yet working on baetle, and in any case it really helps to roll one's own, as far as getting an understanding of the issues go. So now we have two groups coming at the same issue from slightly different angles. This could help us hammer out issues that each of us might have overseen individually. Perhaps that is what is needed to help forge a successful standard? In prodedural software engineering it does, but I am not sure how this works in the relational world. Here are some thought that come to mind as I am looking at BOM in more detail. 1. One namespace ================ BOM uses one namespace. With Baetle I have opted to use other well known ontologies from the start: foaf, sioc, doap, and others. Which is better? No idea. + Working with other ontologies has the advantage of increasing recognition of people working in the field. People know the foaf ontology well, so they can use their knowledge here as elsewhere. - On the other hand it could make querying a SPARQL endpoint more verbose, as one has to declare all the extra namespaces. - Working with multiple ontologies means one is somewhat less in control of the whole domain, and one is a little at the mercy of the interest the authors of the other ontology may put in your work + Working with multiple ontologies distributes the workload I notice that you have nicely separated the different topics in your ontology such as the revision control system and the bom ontologies. So at that level you do make use of namespaces. 2. Revision Control Systems =========================== Your ontology pretty much covers the classes I have named in my UML diagram as DocumentVersion, SoftwarePackage and Version. baetle:Version is very similar to your rcs:Revision class. The difference is that you think of revisions as being revisions of something, whereas I think of a Version as being closer to an interface that Documents among other things, can have. This is probably influenced by my work on AtomOwl [3]. So for example you would have <http://eg.com/vc/file.java> a rcs:File; rcs:hasRevison <http://eg.com/vc/file.java?rev=1> . The question here is, what is the revision going to point to? And what is the file? The obvious thing to do is to have it point to another rcs:File, as above, but you have made rcs:File, rcs:Revision and rcs:Releases disjoint classes I think. So would that force one to have a revision be a blank node? <http://eg.com/vc/file.java?rev=1> a rcs:File; rcs:hasRevison [ rcs:number 1; rcs:isRevisionOf <http://eg.com/vc/ file.java> ]. This is creating more indirection than is worth in my opinion. (but see the next section for illumination) Here is how baetle does it with a real NetBeans example. @prefix as: <http://www.netbeans.org/source/browse/apisupport/project/ src/> as:org/netbeans/modules/apisupport/project/queries/ ClassPathProviderImpl.java?rev=1.10 a foaf:Document, baetle:Version; baetle:id as:org/netbeans/modules/apisupport/project/queries/ ClassPathProviderImpl.java; baetle:updated "2006-10-31T23:34:26+0000"^^xsd:dateTime; baetle:previous as:org/netbeans/modules/apisupport/project/ queries/ClassPathProviderImpl.java?rev=1.9; . Arguably the id should be an xsd:anyURI as it is in AtomOwl. That would remove the need to work out what type of thing it should point to. If it is to point to a resource, it seems best to me that it point to a full history of the file, but that is something I have not yet determined. The idea of giving files ids is getting to be very popular. In Sun for example we have a Content Management system that adds such an id URI into every document, allowing it thereby to be tracked as it moves from one project to another, or it's history within the same project. The nice thing about working this way, is that a file can become a versioned file, if it is changed, without introducing a new indirection. Versioning a file is just adding new relations from it to other files. Revision may be a better word that Version though. I like the idea of having a latest version relation, though we can get the same thing indirectly without introducing a temporally indexed relation by ordering on the updated time stamp. The name Release is also better chosen than SoftwarePackage. But again here you have made a Release exclusive of a File. Whereas it seems to me that a Release just is a file, a special type of file perhaps, but nevertheless just a file... Commits? -------- There is another way of looking at your Revision class, and this would be to think of it as more like a commit. In SubVersion commits are atomic, and so they affect the whole repository. Here indeed I would understand that Revisions are distinct from Files, since a Revision here can affect the whole state of the repository. I noticed this by seeing how your bom:Issue has a relation bom:isResolvedIn with a domain of Revision, which is similar to the relation between a Committing and an Issue in baetle. Here again though I would think one could easily separate a :Version class as defined in baetle, and have a repository state, that is versioned. Each repository state links to every file in the repository at that state, and the state has a previous state and a next state, etc... Baetle Commits are not well thought out. I was not sure what the best way to model this was across repositories. Perhaps you have thought more on this. rcs:Revision seems to better capture the abstraction. In CVS repositories it would take a little work to get all that information together. Bug Ontology ============ There are a lot of different classes here, both in baetle as in bom. Bugzilla? --------- As a matter of interest, did you forge your intuitions by working with bugzilla? That is what we have done, and your relations seem very similar to the ones we have developed. It would be useful to have some research comparing the different bug tracking systems [4]. WorkFlow -------- One thing the baetle ontology gains from is using Tim Berners Lee's workflow ontology. This helps structure the relation between an Issue (which is a wf:Task), the wf:State it is in, and the wf:Actions that have to be taken to resolve them. Comments -------- Working with sioc, we get all their ontology for relating communities. an Issue is a task, and is also a forum where people discuss the task. Exactly how one should relate these is not yet clear to me. Conclusion ========== Well that is as much as I have time for now. You work is very interesting an compelling. I am not sure what one should do here? Should we use your ontology, should we continue to work on our own and take inspiration from yours, and evolve our own as we the needs dictate, or should we work together? My main aim in the short term is to have a presentation for JavaOne and Jazoon in Zurich where I can link NetBeans to such a bug tracking SPARQL endpoint. This is why the baetle ontology is still very sketchy. We should definitely meet in Zurich in June (Jazoon is 25th-28th), if you are there. Your code analysis tools should be really interesting to the people at NetBeans (still need to look at those) Thanks again, Henry [1] http://www.ifi.unizh.ch/ddis/evoont.html [2] http://groups.google.com/group/baetle/browse_thread/thread/ bea606c0f7c5626 [3] http://bblfish.net/work/atom-owl/2006-06-06/AtomOwl.html [4] http://code.google.com/p/baetle/wiki/BugDatabases On 5 Apr 2007, at 06:45, Christoph Kiefer wrote: > Dear Henry > > Your post is very intersting and exciting since it also touches our > own > research. Please have a look at our activities in that field, I guess > they say more than I should write here: > > http://www.ifi.unizh.ch/ddis/evoont.html > http://www.ifi.unizh.ch/ddis/research/semweb/isparql/ > > And also important our publications: > > 1/ "Analyzing Software with iSPARQL" (under review) > > 2/ "Mining Software Repositories with iSPARQL and a Software Evolution > Ontology." > > 3/ "OptARQ: A SPARQL Optimization Approach based on Triple Pattern > Selectivity Estimation." > > 4/ "Detecting Similar Java Classes Using Tree Algorithms." > > The Baetle data, if available, would of course be very interesting to > test our iSPARQL and OptARQ systems. > > Best regards > Christoph > > Henry Story schrieb: > > > > > > Hi, > > > > I am currently working on Baetle, the Bug and Enhancement Tracking > > LanguagE over on google code [1]. > > > > The project was started about a month ago, and advancing > speedily. We > > have moved from a simple UML outline of a diagram to a sketch of an > > rdf ontology . But much more importantly we are starting to test > this > > with real data, to get people to play with the information, and > in an > > iterative Agile programming way, help us improve the ontology with > > real use cases, thereby allowing us to develop the use cases > > themselves... > > > > In the last couple of weeks we have extracted over 5 million > > relations from the NetBeans bug database using a D2RQ mapping, and 1 > > million relations from the CVS repository [2] We have put up a > SPARQL > > endpoint using Sesame 2.0beta2 inside of Sun, with the hope of > > releasing the data as one large Ntriples file. Large organisations > > being what they are, and as I am not a great organizational man, but > > rather a coder and semantic web evangelist, this may take a little > > more time to come out than it should. Be that as it may, it is great > > fun to play with such a large database of facts. > > > > But the Semantic Web all by oneself is no fun. Having a SPARQL > > endpoint for just the bugs in NetBeans is a great database > > experiment, much easier to put together it is true because of the > > clarity of rdf, but not yet quite a full fledged SemWeb experience. > > Furthermore focusing on NetBeans is probably skewing our ontology > > towards CVS repositories and Bugzilla like bug databases, so I am > > calling on other open source software projects to join in and > open up > > their bug databases and version control repositories to a SPARQL > > endpoint so that we can all sing together to the Baetles. :-) > > > > The work is not that much and I am more than willing to help get > > things going. My idea is that the best candidates initially for this > > would be Semantic Web Open Source software projects. The first two > > that came to mind were Jena and Sesame, as they have a large set of > > code, have been very active, and as this would give them data to > help > > test their own frameworks; so I am ccing them here. But the forum is > > open to everyone of course. > > > > >From my experience opening up NetBeans I now know that: > > > > - extracting bugs from a database with D2RQ is very easy. > > - extracting commit messages and source files is easy with tools > > such as StatCVS [3] > > - linking bugs to source code was very easy on the NetBeans > > project because the developers there stuck to a very simple > > convention to annotate their commits with the bug numbers they were > > fixing in an easily parseable way. > > - linking source code to the binaries they are built into should > > also be easy going. (I am just about to embark on this) [4]. This > > type of relation could I believe be the best way to abstract away > the > > differences between version control systems, btw [4] > > > > With a few extra databases opened up we would be able to improve > both > > the ontology as well as the use cases quite a bit, as well as > > becoming a real World Wide Web Semantic Project. > > > > Please don't hesitate to contact me, join the list, ask questions, > > contribute SQL dumps of your bug database, or information about how > > to extract information from your repositories, ... Join early and be > > famous :-) > > > > Henry Story > > > > [1] http://code.google.com/p/baetle/ <http://code.google.com/p/ > baetle/> > > [2] see "first sparql endpoint" thread on the mailing list > > http://groups.google.com/group/baetle/browse_thread/thread/ > > <http://groups.google.com/group/baetle/browse_thread/thread/> > > c2244b838e84c4fc > > [3] http://statcvs.sf.net/ <http://statcvs.sf.net/> > > [4] "Does one need to tag Source Code"? > > http://groups.google.com/group/baetle/browse_thread/thread/ > > <http://groups.google.com/group/baetle/browse_thread/thread/> > > 0bea606c0f7c5626 > > > > Home page: http://bblfish.net/ <http://bblfish.net/> > > Sun Blog: http://blogs.sun.com/bblfish/ <http://blogs.sun.com/ > bblfish/> > > Foaf name: http://bblfish.net/people/henry/card#me > > <http://bblfish.net/people/henry/card#me> > > -- > Christoph Kiefer > Department of Informatics, University of Zurich > http://www.ifi.unizh.ch/ddis/christophkiefer.html >
Received on Thursday, 5 April 2007 11:15:10 UTC