Re: FW: Failed to port datastore to RDF, will go Mongo

Hi William, Friederich.

This is an excellent email. My replies inlined. Hope I can help.

On Wed, Nov 24, 2010 at 9:47 AM, William Waites <ww@styx.org> wrote:
> Friedrich, I'm forwarding your message to one of the W3 lists.
>
> Some of your questions could be easily answered (e.g. for euro in your
> context, you don't have a predicate for that, you have an Observation
> with units of a currency and you could take the currency from
> dbpedia, the predicate is "units").
>
> But I think your concerns are quite valid generally and your
> experience reflects that of most web site developers that encounter
> RDF.
>
> LOD list, Friedrich is a clueful developer, responsible for
> http://bund.offenerhaushalt.de/ amongst other things. What can we
> learn from this? How do we make this better?
>
> -w
>
>
> ----- Forwarded message from Friedrich Lindenberg <friedrich@pudo.org> -----
>
> From: Friedrich Lindenberg <friedrich@pudo.org>
> Date: Wed, 24 Nov 2010 11:56:20 +0100
> Message-Id: <A9089567-6107-4B43-B442-D09DCC0C353D@pudo.org>
> To: wdmmg-discuss <wdmmg-discuss@lists.okfn.org>
> Subject: [wdmmg-discuss] Failed to port datastore to RDF, will go Mongo
>
> (reposting to list):
>
> Hi all,
>
> As an action from OGDCamp, Rufus and I agreed that we should resume porting WDMMG to RDF in order to make the data model more flexible and to allow a merger between WDMMG, OffenerHaushalt and similar other projects.
>
> After a few days, I'm now over the whole idea of porting WDMMG to RDF. Having written a long technical pro/con email before (that I assume contained nothing you don't already know), I think the net effect of using RDF would be the following:
>
> * Lots of coolness, sucking up to linked data people.
> * Further research regarding knowledge representation.

I will quickly outline some points that I think are advantages from a
developer POV. ( once you tackle the problems you outline below, of
course ).
* A highly expressive language ( SPARQL )
* Ease of creating workflows where data moves from one app to another.
And this is not just buzz. The self-contained nature of triples and
IDs make it so that you can SPARQL select on one side and SPARQL
insert on another. I do this all the time, creating "data pipelines".
I admit it has taken some time to master, but I can peform "magic"
from my customer's point of view.

>
> vs.
>
> * Unstable and outdated technological base. No triplestore I have seen so far seemed on par with MySQL 4.

* You definitely need to give Virtuoso a try. It is a mature SQL
database that grew into RDF. I Strongly disagree with this point as I
have personally created highly demanding projects for large companies
using Virtuoso's Quad Store. To give you a real life case, the recent
Brazilian Election portal by Globo.com (
http://g1.globo.com/especiais/eleicoes-2010/ ) has Virtuoso under the
hood and, being a highly important, mission critical app in a major (
4th ) media company  it is not a toy application.
I know many others but in this one I participated so I can tell you it
is Virtuoso w/o fear mistake.

> * No freedom wrt to schema, instead modelling overhead. Spent 30 minutes trying to find a predicate for "Euro".

Yes!
This is a major problem and we as a community need to tackle it.
I am intrigued to see what ideas come up in this thread. Thanks for
bringing it up.

As an alternative, you can initially model everything using a simple
urn:foo:xxx or http://mydomain.com/id/xxx schema ( this is what I do )
and as you move fwd you can refactor the model. Or not.

You can leave it as is and it will still be integratable ( able to
live along other datasets in the same store ).

Deploying the "Linked" part of Linked Data ( the dereferencing
protocols ) later on is another game.

> * Scares off developers. Invested 2 days researching this, which is how long it took me to implement OHs backend the first time around. Project would need to be sustained through linked data grad students.
> * Less flexibility wrt to analytics, querying and aggregation. SPARQL not so hot.

Did you try Virtuoso? Seriously.
It provides out of the box common aggregates and is highly extensible.
You basically have a development platform at your disposal.

> * Good chance of chewing up the UI, much harder to implement editing.

Definitely hard. This is something I hope will be alleviated once we
start getting more demos into the wild. But, take note: the Active
Record + MVC pattern works. This is not as alien as it seems.

Also, SPARQL also removes the "joines" as some of the major NoSQL
offerings do. I find it terribly easy to create UIs over RDF, but I
have been doing it for a while already.

>
> I normally enjoy learning new stuff. This is just painful. Most of the above points are probably based on my ignorance, but it really shouldn't take a PhD to process some gov spending tables.
>
> I'll now start a mongo effort because I really think this should go schema-free + I want to get stuff moving. If you can hold off loading Uganda and Israel for a week that would of course be very cool, we could then try to evaluate how far this went. Progress will be at: http://bitbucket.org/pudo/wdmmg-core

My exec summary to you is this:
* Instead of mongo, use Virtuoso with your own predicates. You will
get a lot of power and you will be able to make your data live
natively as RDF. This means it will be easily importable and meshable
with other datasets, initially.
* If UI is an issue, you can throw in your questions to public-lod and
lots of us will answer with patterns, strategies, etc.

Regards,
A

>
> Friedrich
>
>
>
> _______________________________________________
> wdmmg-discuss mailing list
> wdmmg-discuss@lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/wdmmg-discuss
>
> ----- End forwarded message -----
>
> --
> William Waites
> http://eris.okfn.org/ww/foaf#i
> 9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664
>
>



-- 
Aldo Bucchi
@aldonline
skype:aldo.bucchi
http://aldobucchi.com/

Received on Wednesday, 24 November 2010 13:34:04 UTC