Re: Looking for pedagogically useful data sets from Kingsley Idehen on 2015-03-14 (public-lod@w3.org from March 2015)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Sat, 14 Mar 2015 15:32:10 -0400
To: public-lod@w3.org
Message-ID: <55048CBA.6060303@openlinksw.com>
On 3/13/15 7:17 PM, Michael Brunnbauer wrote:
> Hello Kingsley,
>
> On Fri, Mar 13, 2015 at 06:06:00PM -0400, Kingsley Idehen wrote:
>> I hope you are not assuming that I meant: ACID and tradition RDBMS are for
>> losers?
> Interpreting what you say is not easy for me. So forgive me if I have read
> too much into your statement.
>
>> My fundamental point is simply about the fact that RDBMS doesn't mean SQL
>> RDBMS.
> OK. I have no problem with SPARQL - I like it. But I have a problem with more
> restricted query languages being used without very good reasons just because
> NOSQL or whatever is hip.

So do I. That's basically the underlying theme of my view i.e., we are 
conceding reality for marketing, which is horrible.

> There are big data and small data use cases for
> both SQL and its alternatives. Most of the time, one of them is clearly the
> best design decision and I reject the notion that one technology solves all
> use cases.

Correct, that's why I continue to chant the "horses for course" mantra. 
In the cased of RDBMS technology we have relations handled in different 
ways by products that support various query languages. None of these 
query languages (e.g. SPARQL or SQL) is the only option for a relation 
database management system. Unfortunately, via effective marketing over 
the years, SQL RDBMS vendors have laid claim the entire realm of 
relational database management leaving alternatives to redefine 
themselves as NoSQL, Graph, Document, oriented Database Management 
Systems etc..
>
>> Thus, ACID has nothing to do with being Relational, in regards to
>> Database document construction and/or management.
> I want to do this:
>
>   BEGIN TRANSACTION
>   <SPARQL query>
>   <SPARQL update>
>   ...
>   COMMIT TRANSACTION
>
> If you know a triple store that can do this, I will concede that ACID has
> nothing to do with being "relational".

What do you mean by "triple store" ? Virtuoso (a hybrid relational 
RDBMS)  is in use at many organizations where they make even make use of 
Distributed Transaction Monitors as part of the RDF data management 
related solutions.

Why are you conflating Transactions Management with Data Organization?

An RDBMS could exist without ACID. Of course they would present utility 
challenges in regards to transactions and oltp settings, but that 
doesn't render said solution as being non-relational. It's just lacking 
in the area of transaction management re., issues addressed by ACID.


>
> If this problem is solved, there is still the "data shapes" problem. I grok
> that you want to allow exceptions from the data shape rules. So show me
>
> 1) how these exception can have value if they are not handled in the code
> 2) how your solution handles the RDF data shapes problem

Shapes  boils down to moving relation member validation into the DBMS 
engine layer, rather than leaving it solely to client code. This is best 
handled via rules which is something we have been working on, for some 
time now. We are supporters of both SPIN and SHACL.

>
> I am also interested in how your solution handles the join performance problem
> (from your company presentations, I gather that you have addressed this
> somehow).

We continue to deliver variants of our engine that include enhancements 
in these areas. Basically, we are getting closer, per release, to no 
difference in performance between our Relational Tables and RDF 
Statements handling.

If you provide a specific example, I can provide a much more specific 
response.

>
> Besides, what is a database document?

SQL:

What an RDBMS engine basically refers to as a "database page" . Exposure 
to clients (via an identifier) depends on implementation.

SPARQL:

What an  RDBMS/Store refers to as "named graph".  It is identified by a 
Named Graph IRI in the FROM (when external) and/or FROM NAMED (when 
internal) clauses of an SPARQL query.

In both cases, you have relations (sets of tuples) associated with one 
or more database documents. Each relation is identified by a predicate.

In a SQL RDBMS, each Table Name identifies a relation predicate. In RDF, 
a relation is identified by the IRI used in the predicate slot of an RDF 
statement.
>
>> A traditional RDBMS != SQL RDBMS either. That has never ever been the case.
> I think it has been the case for quite a while.

In marketing collateral and memes. Never ever been a technical fact.

> But if you go back long enough,
> that ceases to be true, yes.

Correct.

Links:

[1] http://w3c.github.io/data-shapes/shacl/ -- SHACL
[2] http://www.w3.org/Submission/spin-overview/ -- SPIN
[3] https://technet.microsoft.com/en-us/library/ms190969(v=sql.105).aspx 
-- SQL Server Database Documents and Relations (Tables)


>
> Regards,
>
> Michael Brunnbauer
>


-- 
Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Saturday, 14 March 2015 19:32:31 UTC