- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Thu, 3 Nov 2016 10:45:02 -0400
- To: Simon Spero <sesuncedu@gmail.com>
- Cc: semantic-web@w3.org, Linked Data Community <public-lod@w3.org>
- Message-ID: <634d1b0b-586e-1e35-1e87-59cbbc07bd45@openlinksw.com>
On 11/2/16 11:59 PM, Simon Spero wrote: > > 1. Virtuoso is an SQL database designed to support SPARQL & RDF. If > the underlying dataset has schema that is mostly regular, using tables > can be a big performance win over the straight triple store. Or it can > be much worse. Also cold vs. warm caches require care when > benchmarking (this applies to just about every RDF store). > > 2. PubChem is a quite lovely dataset to work with when you only want > some of it (especially for non bio :) > > Simon > Simon, +1 to that :) Fundamentally, Virtuoso is a demonstration of what's possible with both SPARQL and SQL using a single high-performance RDBMS. It also leverages understanding of RDF-Language with regards to critical issues such as data security and privacy using Attribute-based Access controls. I guess its time to produce a few posts about how you can extend SQL (one standard) using SPARQL (another standard) with regards to powerful data access and integration, without compromising security and privacy etc.. [1] http://kidehen.blogspot.com/2015/07/conceptual-data-virtualization-across.html [2] https://www.linkedin.com/pulse/dbpedia-201604-edition-kingsley-uyi-idehen [3] https://www.linkedin.com/pulse/reasoning-inference-using-british-royal-family-part-idehen -- covers Custom (rather than in-built) Reasoning & Inference using SPARQL as Rules Language (*note: this is part of the soon to be released 8.0 Edition of Virtuoso) . Kingsley > > On Nov 2, 2016 11:23 PM, "Bernadette Hyland" <bhyland@3roundstones.com > <mailto:bhyland@3roundstones.com>> wrote: > > Hi Andrew, > I share this with the caveats that every app has unique > requirements, and second, we all have a tendency to use > technologies with which we’re familiar. > > In our case, we focus on linked data modeling and app development > using Callimachus Enterprise.[1] Our team has OpenRDF Sesame > chops, so we often use that store. Callimachus (OSS or > Enterprise), is fanatical about RDF/SPARQL 1.1 compliance, and > that is really the important part IMHO. > > Back to your question about slinging larger RDF bulk data - > Recently, we needed to work with a data download (PubChem RDF > weighs in at a hefty 99B triples), with a download size of about > 40GB. The PubChem data stewards recommend that the database needs > 64GB RAM and 500GB disk.[2] > > We thought we might blow the gaskets on OpenRDF Sesame, so we > opted for Open Link Software's Virtuoso.[3] We installed Virtuoso > on an AWS large instance to manipulate the 99B triples down to the > more manageable dataset of around 6B triples (chemical synonyms > and descriptors), that we needed. Worked very well. > > That said, if we need to scale up or our client has a preference > for a specific triple store, we develop the UI layer using the > Callimachus Web application server, which speaks to any SPARQL 1.1 > compliant triple store on the server, e.g., MarkLogic, Ontotext > GraphDB, others. > > We’ll typically prototype with OpenRDF Sesame, because we know it > well, and then scale as required. FWIW, it took one developer < 1 > day to integrate with a MarkLogic and GraphDB — because both of > these database vendors are good about SPARQL 1.1 compliance. > > Note: We have no commercial relationship with any graph database > company, in fact, we're database agnostic. > > In summary, we use Callimachus Enterprise to create applications > using HTML5/CSS3, building named queries using SPARQL 1.1. For > some apps, we’ll split up the data onto multiple OpenRDF Sesame > instances, as required. If the customer wants to use / pay for a > license to another SPARQL 1.1 compliant persistent store, we’re > all over it. > > Bottom line: If you go with vendors that make good on RDF/SPARQL > 1.1 standards compliance, you can sling some pretty hefty RDF and > build nice UIs on top quickly. > > Anyone doing Linked Data beyond the prototyping phase is using > some combination of OSS + commercially licensed products for the > Web server/UI and persistent store layers. > > Hope that helps. > > Cheers, > > Bernadette Hyland > bhyland@3roundstones.com <mailto:bhyland@3roundstones.com> || > Skype BernHyland > > [1] http://callimachusproject.org/ <http://callimachusproject.org/> > > [2] https://pubchem.ncbi.nlm.nih.gov/rdf/#table2 > <https://pubchem.ncbi.nlm.nih.gov/rdf/#table2> > > [3] https://virtuoso.openlinksw.com/dataspace/doc/(NULL)/wiki/Main/ > <https://virtuoso.openlinksw.com/dataspace/doc/%28NULL%29/wiki/Main/> > > >> On Nov 3, 2016, at 00:38, Andrew Woods <awoods@duraspace.org >> <mailto:awoods@duraspace.org>> wrote: >> >> Hello Bernadette, >> Would you be willing to share the name of the triplestore >> implementation you are using to store 99B triples? >> Thanks, >> Andrew Woods >> >> On Wed, Nov 2, 2016 at 10:24 AM, Bernadette Hyland >> <bhyland@3roundstones.com <mailto:bhyland@3roundstones.com>> wrote: >> >> Hi Ai-jun, >> Not sure that storing RDF triples in a relational database is >> novel, at least not in 2016. And 300M isn’t a big number in >> the world of graph databases. For example, we’re working with >> a linked data repository, PubChem with 99B triples, and >> linking it to a subset of environmental linked open data. >> Point is, graph databases are a useful tool for specific >> jobs, just like RDBMS’s are great for other jobs. >> >> More importantly, getting triples out in a speedy manner, >> using a standard query language, and building a nice UI, is >> the part many people in the linked data community have spent >> 10+ years getting right. >> >> Just my 2 cents. >> >> Cheers, >> >> Bernadette Hyland >> CEO, 3 Round Stones, Inc. >> >> >> >>> On Nov 2, 2016, at 04:11, Li, Ai-jun >>> <Ai-jun.Li@morganstanley.com >>> <mailto:Ai-jun.Li@morganstanley.com>> wrote: >>> >>> >>> I came across a very old request for comments for storing >>> RDF data in relational database >>> (http://infolab.stanford.edu/~melnik/rdf/db.html >>> <http://infolab.stanford.edu/%7Emelnik/rdf/db.html>). I was >>> unable to find any newer discussion on this. We had >>> implemented a very innovative way of storing linked graph >>> data in Sybase many years ago and the system is still being >>> used today. The system is storing the equivalent of over 300 >>> million triples and is scalable for much more. We’d be happy >>> to share our approach if this is something the community is >>> still interested in (will need to get the firm’s approval, >>> obviously). >>> >>> Thanks, >>> Ai-jun Li >>> *Morgan Stanley | Enterprise Infrastructure >>> *1 New York Plaza, 16th Floor | New York, NY 10004 >>> Phone: +1 646 536-0765 <tel:%2B1%20646%20536-0765> >>> Ai-jun.Li@morganstanley.com >>> <mailto:Ai-jun.Li@morganstanley.com> >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> NOTICE: Morgan Stanley is not acting as a municipal advisor >>> and the opinions or views contained herein are not intended >>> to be, and do not constitute, advice within the meaning of >>> Section 975 of the Dodd-Frank Wall Street Reform and >>> Consumer Protection Act. If you have received this >>> communication in error, please destroy all electronic and >>> paper copies and notify the sender immediately. >>> Mistransmission is not intended to waive confidentiality or >>> privilege. Morgan Stanley reserves the right, to the extent >>> permitted under applicable law, to monitor electronic >>> communications. This message is subject to terms available >>> at the following >>> link: http://www.morganstanley.com/disclaimers >>> <http://www.morganstanley.com/disclaimers> If you cannot >>> access these links, please notify us by reply message and we >>> will send the contents to you. By communicating with Morgan >>> Stanley you consent to the foregoing and to the voice >>> recording of conversations with personnel of Morgan Stanley. >> >> > -- Regards, Kingsley Idehen Founder & CEO OpenLink Software (Home Page: http://www.openlinksw.com) Weblogs (Blogs): Legacy Blog: http://www.openlinksw.com/blog/~kidehen/ Blogspot Blog: http://kidehen.blogspot.com Medium Blog: https://medium.com/@kidehen Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about LinkedIn: http://www.linkedin.com/in/kidehen Web Identities (WebID): Personal: http://kingsley.idehen.net/dataspace/person/kidehen#this : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
Attachments
- application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Thursday, 3 November 2016 14:45:42 UTC