Re: FW: Failed to port datastore to RDF, will go Mongo

From: Aldo Bucchi <aldo.bucchi@gmail.com> · Date: Wed, 24 Nov 2010 10:46:46 -0300

Sorry, I forgot to add something critical.

Ease of integration ( moving triples ) is just the beginning. Once you
get a hold on the power of ontologies and inference as "views" your
data starts becoming more and more useful.

But the first step is getting your data into RDF and the return on
that investment is SPARQL and the ease to integrate.

I usually end up with several transformation pipelines and accesory
TTL files which get all combined into one dataset. TTLs are easily
editable by hand, collaboratively versiones, while giving you full
expressivity.

TTL files alone are why some developers fall in love with Linked Data.

On Wed, Nov 24, 2010 at 10:33 AM, Aldo Bucchi <aldo.bucchi@gmail.com> wrote:
> Hi William, Friederich.
>
> This is an excellent email. My replies inlined. Hope I can help.
>
> On Wed, Nov 24, 2010 at 9:47 AM, William Waites <ww@styx.org> wrote:
>> Friedrich, I'm forwarding your message to one of the W3 lists.
>>
>> Some of your questions could be easily answered (e.g. for euro in your
>> context, you don't have a predicate for that, you have an Observation
>> with units of a currency and you could take the currency from
>> dbpedia, the predicate is "units").
>>
>> But I think your concerns are quite valid generally and your
>> experience reflects that of most web site developers that encounter
>> RDF.
>>
>> LOD list, Friedrich is a clueful developer, responsible for
>> http://bund.offenerhaushalt.de/ amongst other things. What can we
>> learn from this? How do we make this better?
>>
>> -w
>>
>>
>> ----- Forwarded message from Friedrich Lindenberg <friedrich@pudo.org> -----
>>
>> From: Friedrich Lindenberg <friedrich@pudo.org>
>> Date: Wed, 24 Nov 2010 11:56:20 +0100
>> Message-Id: <A9089567-6107-4B43-B442-D09DCC0C353D@pudo.org>
>> To: wdmmg-discuss <wdmmg-discuss@lists.okfn.org>
>> Subject: [wdmmg-discuss] Failed to port datastore to RDF, will go Mongo
>>
>> (reposting to list):
>>
>> Hi all,
>>
>> As an action from OGDCamp, Rufus and I agreed that we should resume porting WDMMG to RDF in order to make the data model more flexible and to allow a merger between WDMMG, OffenerHaushalt and similar other projects.
>>
>> After a few days, I'm now over the whole idea of porting WDMMG to RDF. Having written a long technical pro/con email before (that I assume contained nothing you don't already know), I think the net effect of using RDF would be the following:
>>
>> * Lots of coolness, sucking up to linked data people.
>> * Further research regarding knowledge representation.
>
> I will quickly outline some points that I think are advantages from a
> developer POV. ( once you tackle the problems you outline below, of
> course ).
> * A highly expressive language ( SPARQL )
> * Ease of creating workflows where data moves from one app to another.
> And this is not just buzz. The self-contained nature of triples and
> IDs make it so that you can SPARQL select on one side and SPARQL
> insert on another. I do this all the time, creating "data pipelines".
> I admit it has taken some time to master, but I can peform "magic"
> from my customer's point of view.
>
>>
>> vs.
>>
>> * Unstable and outdated technological base. No triplestore I have seen so far seemed on par with MySQL 4.
>
> * You definitely need to give Virtuoso a try. It is a mature SQL
> database that grew into RDF. I Strongly disagree with this point as I
> have personally created highly demanding projects for large companies
> using Virtuoso's Quad Store. To give you a real life case, the recent
> Brazilian Election portal by Globo.com (
> http://g1.globo.com/especiais/eleicoes-2010/ ) has Virtuoso under the
> hood and, being a highly important, mission critical app in a major (
> 4th ) media company  it is not a toy application.
> I know many others but in this one I participated so I can tell you it
> is Virtuoso w/o fear mistake.
>
>> * No freedom wrt to schema, instead modelling overhead. Spent 30 minutes trying to find a predicate for "Euro".
>
> Yes!
> This is a major problem and we as a community need to tackle it.
> I am intrigued to see what ideas come up in this thread. Thanks for
> bringing it up.
>
> As an alternative, you can initially model everything using a simple
> urn:foo:xxx or http://mydomain.com/id/xxx schema ( this is what I do )
> and as you move fwd you can refactor the model. Or not.
>
> You can leave it as is and it will still be integratable ( able to
> live along other datasets in the same store ).
>
> Deploying the "Linked" part of Linked Data ( the dereferencing
> protocols ) later on is another game.
>
>> * Scares off developers. Invested 2 days researching this, which is how long it took me to implement OHs backend the first time around. Project would need to be sustained through linked data grad students.
>> * Less flexibility wrt to analytics, querying and aggregation. SPARQL not so hot.
>
> Did you try Virtuoso? Seriously.
> It provides out of the box common aggregates and is highly extensible.
> You basically have a development platform at your disposal.
>
>> * Good chance of chewing up the UI, much harder to implement editing.
>
> Definitely hard. This is something I hope will be alleviated once we
> start getting more demos into the wild. But, take note: the Active
> Record + MVC pattern works. This is not as alien as it seems.
>
> Also, SPARQL also removes the "joines" as some of the major NoSQL
> offerings do. I find it terribly easy to create UIs over RDF, but I
> have been doing it for a while already.
>
>>
>> I normally enjoy learning new stuff. This is just painful. Most of the above points are probably based on my ignorance, but it really shouldn't take a PhD to process some gov spending tables.
>>
>> I'll now start a mongo effort because I really think this should go schema-free + I want to get stuff moving. If you can hold off loading Uganda and Israel for a week that would of course be very cool, we could then try to evaluate how far this went. Progress will be at: http://bitbucket.org/pudo/wdmmg-core
>
> My exec summary to you is this:
> * Instead of mongo, use Virtuoso with your own predicates. You will
> get a lot of power and you will be able to make your data live
> natively as RDF. This means it will be easily importable and meshable
> with other datasets, initially.
> * If UI is an issue, you can throw in your questions to public-lod and
> lots of us will answer with patterns, strategies, etc.
>
> Regards,
> A
>
>>
>> Friedrich
>>
>>
>>
>> _______________________________________________
>> wdmmg-discuss mailing list
>> wdmmg-discuss@lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/wdmmg-discuss
>>
>> ----- End forwarded message -----
>>
>> --
>> William Waites
>> http://eris.okfn.org/ww/foaf#i
>> 9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664
>>
>>
>
>
>
> --
> Aldo Bucchi
> @aldonline
> skype:aldo.bucchi
> http://aldobucchi.com/
>

-- 
Aldo Bucchi
@aldonline
skype:aldo.bucchi
http://aldobucchi.com/