Re: [tangle] getting the semweb exactly wrong from Michael Nachbaur on 2006-01-04 (semantic-web@w3.org from January 2006)

From: Michael Nachbaur <mike@nachbaur.com>
Date: Wed, 4 Jan 2006 13:08:43 -0800
To: Kjetil Kjernsmo <kjetil@kjernsmo.net>
Cc: semantic-web@w3.org
Message-Id: <3F3B5C36-182D-45F9-849B-E4A8273A503B@nachbaur.com>
On Jan 4, 2006, at 12:10 PM, Kjetil Kjernsmo wrote:

>
> On Wednesday 04 January 2006 20:03, Tim Berners-Lee wrote:
>> One answer is: don't!  The SemWeb is about conecting the data to what
>>   it means.
>> Keep the data in the place where it works and runs fast.
>
> Actually, I wouldn't dismiss it so fast. On the border between Perl- 
> land
> and Redland there has been much experimenting lately with using a
> Redland model and toolchain as the model of a Model-View-Controller,
> and from the reports I've heard this has been successful.
> Unfortunately, many of the details are under NDA's so I haven't got
> them, but I did some experiments with my own Redland-based SPARQL
> engine, and the results where "OKish", some SPARQL queries against the
> Redland model where not substantially slower than querying the RDBMS
> for the same using the underlying engine. Others weren't as good,  
> but I
> didn't experiment enough to understand what the issues could be.

I suspect I'm one of the NDA-ridden examples Kjetil is speaking of.   
And yes, it works really well.  This project has the virtue of being  
able to be built from scratch, but for the most part development on  
an RDF model and an RDBMS model are little different, though the RDF  
model buys us vastly more flexibility, enabling us to design an  
application that would be far more complicated in RDBMS-land.  This  
is actually a second-generation application that's replacing a  
conventional RDBMS application...and already we can see the benefits.

The previous system was littered with tons of support tables all  
trying to abstract or describe relationships between entities, all of  
which you get automatically in RDF.  And as requirements change (as  
they invariably do) it's easy to just add additional properties to  
the RDF model and create additional associations between sets of data.

> I think modelling the data as RDF and use a Redland model for the  
> model
> of the MVC is emerging as a very interesting architecture for web  
> apps,
> at least for smaller or medium sized applications with a few million
> triples (which is the range I've worked in). This has the added  
> benefit
> that the whole app is trivially semweb, and therefore that the model
> can be anywhere and anything, the model is the web. I have about two
> lines of code of a album app I will build this way... :-) I'm sure the
> Java world might feel the same about their tools.

I'd have to agree with Kjetil on this one.  There are advantages to  
using an RDF model for your underlying datastore in instances where  
normalizing your data into a flat set of RDBMS tables either would be  
too cumbersome, too rigid, or where you don't know ahead of time what  
you really need to do with your data when you set out on a project.

There is one classic instance I've recently run into that just simply  
screams RDF.  I helped with a requirements analysis of a project that  
a professor of design theory (graphics and architectural design)  
wants to model and document the relationships, influences and  
emotions behind different aspects of design.  Without boring everyone  
with details, the problem is largely one of tagging, and identifying  
weighted rules behind how those tags relate, and using images to  
illustrate these concepts.

Basically I realized in talking with him that, since this is a  
research project, even the guy doing all this work doesn't know the  
requirements of the project since it's largely a discovery process.   
So by modelling everything in an RDF datastore, the project's data  
and the intended meaning of the data can both be structured the same  
way.

> However, it is clear that there exists relatively little experience
> around this and that if one goes with it, itrequires a lot of
> experimentation. I would therefore not argue that any approach is  
> to be
> preferred over the other, but I would certainly be interested in
> hearing from people who try. I think the main performance challenge
> will be to cache the results of common queries and connected to
> validating the cache.

Yes, it is certainly experimental, but so far I think - within the  
problem domains I've dealt with it - that RDF models can certainly  
work well.  But I think it's definitely a case of "Use The Tool That  
Gets The Job Done".  If all you need is an RDBMS, then go to it, and  
good luck to you, is what I say.  If, however, what you need to do  
stretches the boundaries of what is possible in an RDBMS (or if what  
you'd build in that RDBMS would turn out to be crufty) then I'd give  
RDF a try and have your data modelled naturally, without  
normalization into tables.

--
Michael Nachbaur <mike@nachbaur.com>
Received on Wednesday, 4 January 2006 21:08:51 UTC