RE: Introducing myself - SOA organised with RDF

Hi Frank,

My name is Brian McBride and I work in the Semantic Web group at HPLabs
in Bristol UK. We have been working on Semantic Web technology since
around 2000 and I have a particular interest in application to IT
systems inside enterprises, a class that includes government
organizations.  I'm writing because we seem to have a common interest
and views. 

[...]

> It was clear to me from the beginning that a SOA soon will 
> turn into another tower of babel, unless there's a clear 
> strategy to normalize the contents flowing on the service 
> bus, and to address the issues of versioning and development 
> in knowledge.

That is my view too - though I don't have a lot of evidence I can point
to in support of it.  This is a great opportunity for Semantic Web
technology.

> 
> Therefore I started a parallel activity to organise new 
> in-house development projects and the information they 
> produce, so that a canonical ontology could be developed for 
> the service bus. I found that RDF and to some extent OWL 
> seemed the most promising technologies to back this effort 
> up, for a number of reasons. First of all I found its simple 
> and powerful structure an ideal model to describe the 
> numerous modelling techniques we use - UML, BPMN, Rules, WSDL 
> and XSD generation - in a uniform manner, so that information 
> may be combined across the different techniques. 

Just so.

> 
> Second we are facing a challenge of controlling our 
> suppliers, rather than being controlled by them.

I'm wondering what you mean by control there.  It is well known that if
a customer invests heavily in implementing systems that depend on the
characteristics of system components, e.g. using proprietary data
formats or APIs, then this creates a barrier to changing suppliers.  I
was expecting you to write that because RDF is based on standards, it
would be in customer's interests to promote its use to give them the
flexibility to change supplier.  But that's not what you wrote ...


> This 
> requires knowledge about the solutions. RDF also seems to be 
> an ideal model for describing the suppliers source code and 
> documentation, and combining it with our ontologies. The 
> combination will enable us to construct impact analysis that 
> will show how changes to our models and ontologies will have 
> an impact on the actual systems and source code. This is the 
> idea at least.

Ah right.  I think there are number of existing solutions that do this -
though not using RDF - e.g. IBM's metadata server.  Have you looked at
that.  Is there something missing from that solution that RDF would
address?

> 
> So far we have built an information base that has something 
> like 50000 objects defined, or something of that size, 
> combining modelling from six actual projects into one large 
> information base of RDF/XML. To handle an information base of 
> this size, and to enable the information for the 
> organization, I decided to go along with the open source XML 
> database eXist.
> (If anybody has any practical experience of combining eXist 
> with RDF, I would be interested to know).

It is important to bear in mind that its best to think of RDF in terms
of its abstract syntax, i.e. a graph of nodes, rather than the RDF/XML
concrete syntax.  There are a number of systems around that will store
significant numbers of RDF triples in a relational store.  We do one,
Jena (http://jena.sourceforge.net) and there are others - sesame,
mulgari, redland, etc.  I'd strongly suggest you take a look at these,
or, if you really feel an XML database is the way to go - I'd like to
understand why.

> 
> With eXist I have built XQueries to list information of 
> specific interest, and others to enable browsing through the 
> RDF graph. I have also built an XQL-query to make forward 
> chaining of the graph. Performance seems to be an issue. If 
> anybody knows how to tune XQuery and eXist, I would be grateful. 

An issue with using XML is that that same RDF graph can be represented
many different ways in RDF/XML.  This would make your queries dependent
on the particular way that an RDF/XML document happened to represent a
graph - and that's just - well - wrong - you would be programming to an
inappropriate level of abstraction.

> 
> I have tried to use CWM, but it seems to crash when I use 
> large graphs. I have also made a simple gawk-script that can 
> actually both make forward-chaining and backward-chaining 
> very efficiently.

CWM is more generally used for its powerful rule capabilities on
relatively small datasets.  Jena also has rules - but they only really
work on small'ish in memory graphs - they are too slow over large
datasets at present.

> 
> But to cut the story short, I have a lot of activity going 
> with RDF, but I am very alone here in my organization, so I 
> hope to make new friends here with whom I can share experience.

I'd be very interested in talking with you; I'm happy to share our
experience with you and am hoping to learn more about your applications
and requirements to aid in our development efforts.

Brian

Received on Thursday, 16 August 2007 07:26:53 UTC