- From: Martynas Jusevičius <martynas@graphity.org>
- Date: Wed, 2 Nov 2016 17:41:47 +0100
- To: "Li, Ai-jun" <Ai-jun.Li@morganstanley.com>
- Cc: Nathan Rixham <nathan@webr3.org>, Bernadette Hyland <bhyland@3roundstones.com>, "semantic-web@w3.org" <semantic-web@w3.org>
- Message-ID: <CAE35Vmwm2V5XALn4WRX3u_a1oTP6Pu+mg9aUSvPq7V0PVt-o9g@mail.gmail.com>
Aijun, another way to go about this would be to keep your relational structure and then use OBDA (ontology-based data access) to provide a virtual RDF/SPARQL layer. The most popular tool these days seems to be ontop: http://ontop.inf.unibz.it/ Performance may or may not be better :) On Wed, Nov 2, 2016 at 5:27 PM, Li, Ai-jun <Ai-jun.Li@morganstanley.com> wrote: > Hi All, > > > > Thanks for the interest. First, I must say that we didn't start by > treating our data as graphs. Secondly, I got exposed to RDF very recently > and still know very little about it. After being exposed to RDF, I realized > that the way we are storing graph data handles provenance and reification > very naturally. From what I understand, current RDF stores don't have a > good way of handling these. > > > > It is easier to describe our approach by first analyzing and categorizing > graph nodes and edges. If you look at any graph data in detail, you’ll find > that not all nodes and edges are equal: > > · Some nodes are classification of things, e.g. *Hardware* in > Personal Computer - Is_A - Hardware > > · Some nodes define types of things, e.g. *Personal Computer* in > myPC - Is_A - Personal Computer > > · Some nodes define instances of a type, e.g. *myPC *is an > instance of Personal Computer > > · Some nodes are properties/attributes of instances of things, > e.g. *4* in myPC - Number_Of_CPU - 4 > > · Some edges describe relationships, e.g. Aijun - * Owns* - myPC > > · Some edges describe properties, e.g. *Number_Of_CPU* in myPC - > Number_Of_CPU - 4 > > Internally, we use the ITIL term Configuration Item (CI) to represent > thing. Corresponding to the above graph categorization, we have created the > following main tables (not all tables are listed) : > > *Table* > > *Explanation* > > CI_Class_Category > > Metadata, stores the categorization nodes > > CI_Classes > > Metadata, stores the type nodes, has a foreign key to CI_Class_Category > > CI_Relationship_Templates > > Metadata, stores the allowed kinds of relationship edges (i.e. define what > kind of relationships are allowed, a relationship has a CI class on the > left and another on the right, and a relationship name) > > CI_Properties > > Metadata, stores the allowed/known property edges > > Data_Sources > > Metadata, stores info about data provider (provenance) > > CI_Instances > > *Main table*, stores instance nodes, contains: CI_Id, CI_Class_Id, > Data_Source_Id, CI_Name, Create_Time, Last_Change_Time, etc. > > CI_Relationships > > *Main table*, stores the relationship edges, contains: Relationship_Id, > Left_CI_Id, Right_CI_Id, Relationship_Type_Id, Data_Source_Id, Create_Time, > etc. > > CI_Property_Values > > *Main table*, stores the property edges and values, contains: CI_Id, > Property_Id, Property_Value, Data_Source_Id, timestamp > > CI_Relationship_Property_Values > > *Main table*, stores properties that describe relationships > (reification), contains: Relationship_Id, Rel_Property_Id, Property_Value, > Data_Source_Id and timestamp > > > > Today, we have over 200 CI classes (aka RDF types) and this number will > keep on increasing. The number of database tables and their schemas, > however, stayed the same since day one. > > > > Data consumption is a challenge. If I want to get a report of all the > Personal Computers together with their properties, for example, I would > need to join the CI_Instances and CI_Property_Values table many times (or > do many sub-queries). Our solution is de-normalize the data to a reporting > database. The reporting database contains a materialized table for each CI > class. The schemas for the tables in the reporting database are created > automatically based on CI class metadata. Data population and changes are > triggered by changes in the normalized database. > > > > I realize that this may not be a very clear explanation of our approach. > Hopefully you can some general ideas. > > > > Cheers, > > Aijun > > > > > > > > *From:* Nathan Rixham [mailto:nathan@webr3.org] > *Sent:* Wednesday, November 02, 2016 11:24 AM > *To:* Bernadette Hyland > *Cc:* Li, Ai-jun (Enterprise Infrastructure); semantic-web@w3.org > *Subject:* Re: Storing RDF in a relational database > > > > There are still many environments where custom / non pre-installed > software is not available, environments where it would often be useful to > have smaller graphs of 1m-1b triples. The vast majority of these provide > RDBMS or Document-Object stores. Hence any proven approaches would be > useful to have publicly available. > > > > On Wed, Nov 2, 2016 at 2:24 PM, Bernadette Hyland < > bhyland@3roundstones.com> wrote: > > Hi Ai-jun, > > Not sure that storing RDF triples in a relational database is novel, at > least not in 2016. And 300M isn’t a big number in the world of graph > databases. For example, we’re working with a linked data repository, > PubChem with 99B triples, and linking it to a subset of environmental > linked open data. Point is, graph databases are a useful tool for specific > jobs, just like RDBMS’s are great for other jobs. > > > > More importantly, getting triples out in a speedy manner, using a standard > query language, and building a nice UI, is the part many people in the > linked data community have spent 10+ years getting right. > > > > Just my 2 cents. > > > > Cheers, > > > > Bernadette Hyland > > CEO, 3 Round Stones, Inc. > > > > > > > > On Nov 2, 2016, at 04:11, Li, Ai-jun <Ai-jun.Li@morganstanley.com> wrote: > > > > > > I came across a very old request for comments for storing RDF data in > relational database (http://infolab.stanford.edu/~melnik/rdf/db.html). I > was unable to find any newer discussion on this. We had implemented a very > innovative way of storing linked graph data in Sybase many years ago and > the system is still being used today. The system is storing the equivalent > of over 300 million triples and is scalable for much more. We’d be happy to > share our approach if this is something the community is still interested > in (will need to get the firm’s approval, obviously). > > > > Thanks, > > Ai-jun Li > > *Morgan Stanley | Enterprise Infrastructure *1 New York Plaza, 16th > Floor | New York, NY 10004 > Phone: +1 646 536-0765 > Ai-jun.Li@morganstanley.com > > > > > ------------------------------ > > > NOTICE: Morgan Stanley is not acting as a municipal advisor and the > opinions or views contained herein are not intended to be, and do not > constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall > Street Reform and Consumer Protection Act. If you have received this > communication in error, please destroy all electronic and paper copies and > notify the sender immediately. Mistransmission is not intended to waive > confidentiality or privilege. Morgan Stanley reserves the right, to the > extent permitted under applicable law, to monitor electronic > communications. This message is subject to terms available at the following > link: http://www.morganstanley.com/disclaimers If you cannot access > these links, please notify us by reply message and we will send the > contents to you. By communicating with Morgan Stanley you consent to the > foregoing and to the voice recording of conversations with personnel of > Morgan Stanley. > > > > > > > ------------------------------ > > NOTICE: Morgan Stanley is not acting as a municipal advisor and the > opinions or views contained herein are not intended to be, and do not > constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall > Street Reform and Consumer Protection Act. If you have received this > communication in error, please destroy all electronic and paper copies and > notify the sender immediately. Mistransmission is not intended to waive > confidentiality or privilege. Morgan Stanley reserves the right, to the > extent permitted under applicable law, to monitor electronic > communications. This message is subject to terms available at the following > link: http://www.morganstanley.com/disclaimers If you cannot access > these links, please notify us by reply message and we will send the > contents to you. By communicating with Morgan Stanley you consent to the > foregoing and to the voice recording of conversations with personnel of > Morgan Stanley. > >
Received on Wednesday, 2 November 2016 16:42:22 UTC