RE: Storing RDF in a relational database from Sebastian Samaruga on 2016-11-02 (semantic-web@w3.org from November 2016)

From: Sebastian Samaruga <ssamarug@gmail.com>
Date: Wed, 2 Nov 2016 15:16:13 -0300
To: "Li, Ai-jun" <Ai-jun.Li@morganstanley.com>
Cc: Nathan Rixham <nathan@webr3.org>, semantic-web@w3.org, Bernadette Hyland <bhyland@3roundstones.com>
Message-ID: <CAOLUXBuKVMzSFWkrBRFgbf-TRP0rcSpKydWuSX-XtfXUxbYjZA@mail.gmail.com>
Hi. I don't know if this is relevant to the question.

I've been working on a metadata extraction tool for RDF with (levels of)
metamodels from which aggregate knowledge for types, schema and
relationships inference, basically for BI and social applications purposes.

It could also be used as a source of metadata for RDBMS like publishing of
ontologies via an OData endpoint and the corresponding JDBC drivers (
http://www.odata.org/ecosystem/consumers/JDBC-Driver/) given the metadata
being aggregated is 'generating' schema from raw input.

The attached PDF is for the draft of the current (unimplemented) version.
Some of your tables remind me projects that I've worked in and that
inspired this current approach.

Best Regards,
Sebastián.

On Nov 2, 2016 1:38 PM, "Li, Ai-jun" <Ai-jun.Li@morganstanley.com> wrote:

> Hi All,
>
>
>
> Thanks for the interest. First, I must say that we didn't start by
> treating our data as graphs. Secondly, I got exposed to RDF very recently
> and still know very little about it. After being exposed to RDF, I realized
> that the way we are storing graph data handles provenance and reification
> very naturally. From what I understand, current RDF stores don't have a
> good way of handling these.
>
>
>
> It is easier to describe our approach by first analyzing and categorizing
> graph nodes and edges. If you look at any graph data in detail, you’ll find
> that not all nodes and edges are equal:
>
> ·         Some nodes are classification of things, e.g. *Hardware* in
> Personal Computer - Is_A - Hardware
>
> ·         Some nodes define types of things, e.g. *Personal Computer* in
> myPC - Is_A - Personal Computer
>
> ·         Some nodes define instances of a type, e.g. *myPC *is an
> instance of Personal Computer
>
> ·         Some nodes are properties/attributes of instances of things,
> e.g. *4* in myPC - Number_Of_CPU - 4
>
> ·         Some edges describe relationships, e.g. Aijun - * Owns* - myPC
>
> ·         Some edges describe properties, e.g. *Number_Of_CPU* in myPC -
> Number_Of_CPU - 4
>
> Internally, we use the ITIL term Configuration Item (CI) to represent
> thing. Corresponding to the above graph categorization, we have created the
> following main tables (not all tables are listed) :
>
> *Table*
>
> *Explanation*
>
> CI_Class_Category
>
> Metadata, stores the categorization nodes
>
> CI_Classes
>
> Metadata, stores the type nodes, has a foreign key to CI_Class_Category
>
> CI_Relationship_Templates
>
> Metadata, stores the allowed kinds of relationship edges (i.e. define what
> kind of relationships are allowed, a relationship has a CI class on the
> left and another on the right, and a relationship name)
>
> CI_Properties
>
> Metadata, stores the allowed/known property edges
>
> Data_Sources
>
> Metadata, stores info about data provider (provenance)
>
> CI_Instances
>
> *Main table*, stores instance nodes, contains: CI_Id, CI_Class_Id,
> Data_Source_Id, CI_Name, Create_Time, Last_Change_Time, etc.
>
> CI_Relationships
>
> *Main table*, stores the relationship edges, contains: Relationship_Id,
> Left_CI_Id, Right_CI_Id, Relationship_Type_Id, Data_Source_Id, Create_Time,
> etc.
>
> CI_Property_Values
>
> *Main table*, stores the property edges and values, contains: CI_Id,
> Property_Id, Property_Value, Data_Source_Id, timestamp
>
> CI_Relationship_Property_Values
>
> *Main table*, stores properties that describe relationships
> (reification), contains: Relationship_Id, Rel_Property_Id, Property_Value,
> Data_Source_Id and timestamp
>
>
>
> Today, we have over 200 CI classes (aka RDF types) and this number will
> keep on increasing. The number of database tables and their schemas,
> however, stayed the same since day one.
>
>
>
> Data consumption is a challenge. If I want to get a report of all the
> Personal Computers together with their properties, for example, I would
> need to join the CI_Instances and CI_Property_Values table many times (or
> do many sub-queries). Our solution is de-normalize the data to a reporting
> database. The reporting database contains a materialized table for each CI
> class. The schemas for the tables in the reporting database are created
> automatically based on CI class metadata. Data population and changes are
> triggered by changes in the normalized database.
>
>
>
> I realize that this may not be a very clear explanation of our approach.
> Hopefully you can some general ideas.
>
>
>
> Cheers,
>
> Aijun
>
>
>
>
>
>
>
> *From:* Nathan Rixham [mailto:nathan@webr3.org]
> *Sent:* Wednesday, November 02, 2016 11:24 AM
> *To:* Bernadette Hyland
> *Cc:* Li, Ai-jun (Enterprise Infrastructure); semantic-web@w3.org
> *Subject:* Re: Storing RDF in a relational database
>
>
>
> There are still many environments where custom / non pre-installed
> software is not available, environments where it would often be useful to
> have smaller graphs of 1m-1b triples. The vast majority of these provide
> RDBMS or Document-Object stores. Hence any proven approaches would be
> useful to have publicly available.
>
>
>
> On Wed, Nov 2, 2016 at 2:24 PM, Bernadette Hyland <
> bhyland@3roundstones.com> wrote:
>
> Hi Ai-jun,
>
> Not sure that storing RDF triples in a relational database is novel, at
> least not in 2016. And 300M isn’t a big number in the world of graph
> databases. For example, we’re working with a linked data repository,
> PubChem with 99B triples, and linking it to a subset of environmental
> linked open data. Point is, graph databases are a useful tool for specific
> jobs, just like RDBMS’s are great for other jobs.
>
>
>
> More importantly, getting triples out in a speedy manner, using a standard
> query language, and building a nice UI, is the part many people in the
> linked data community have spent 10+ years getting right.
>
>
>
> Just my 2 cents.
>
>
>
> Cheers,
>
>
>
> Bernadette Hyland
>
> CEO, 3 Round Stones, Inc.
>
>
>
>
>
>
>
> On Nov 2, 2016, at 04:11, Li, Ai-jun <Ai-jun.Li@morganstanley.com> wrote:
>
>
>
>
>
> I came across a very old request for comments for storing RDF data in
> relational database (http://infolab.stanford.edu/~melnik/rdf/db.html). I
> was unable to find any newer discussion on this. We had implemented a very
> innovative way of storing linked graph data in Sybase many years ago and
> the system is still being used today. The system is storing the equivalent
> of over 300 million triples and is scalable for much more. We’d be happy to
> share our approach if this is something the community is still interested
> in (will need to get the firm’s approval, obviously).
>
>
>
> Thanks,
>
> Ai-jun Li
>
> *Morgan Stanley | Enterprise Infrastructure    *1 New York Plaza, 16th
> Floor | New York, NY  10004
> Phone: +1 646 536-0765
> Ai-jun.Li@morganstanley.com
>
>
>
>
> ------------------------------
>
>
> NOTICE: Morgan Stanley is not acting as a municipal advisor and the
> opinions or views contained herein are not intended to be, and do not
> constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall
> Street Reform and Consumer Protection Act. If you have received this
> communication in error, please destroy all electronic and paper copies and
> notify the sender immediately. Mistransmission is not intended to waive
> confidentiality or privilege. Morgan Stanley reserves the right, to the
> extent permitted under applicable law, to monitor electronic
> communications. This message is subject to terms available at the following
> link: http://www.morganstanley.com/disclaimers  If you cannot access
> these links, please notify us by reply message and we will send the
> contents to you. By communicating with Morgan Stanley you consent to the
> foregoing and to the voice recording of conversations with personnel of
> Morgan Stanley.
>
>
>
>
>
>
> ------------------------------
>
> NOTICE: Morgan Stanley is not acting as a municipal advisor and the
> opinions or views contained herein are not intended to be, and do not
> constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall
> Street Reform and Consumer Protection Act. If you have received this
> communication in error, please destroy all electronic and paper copies and
> notify the sender immediately. Mistransmission is not intended to waive
> confidentiality or privilege. Morgan Stanley reserves the right, to the
> extent permitted under applicable law, to monitor electronic
> communications. This message is subject to terms available at the following
> link: http://www.morganstanley.com/disclaimers  If you cannot access
> these links, please notify us by reply message and we will send the
> contents to you. By communicating with Morgan Stanley you consent to the
> foregoing and to the voice recording of conversations with personnel of
> Morgan Stanley.
>
>
Attachments

application/pdf attachment: Draft.pdf
Received on Wednesday, 2 November 2016 18:43:24 UTC