- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Thu, 3 Nov 2016 10:45:02 -0400
- To: Simon Spero <sesuncedu@gmail.com>
- Cc: semantic-web@w3.org, Linked Data Community <public-lod@w3.org>
- Message-ID: <634d1b0b-586e-1e35-1e87-59cbbc07bd45@openlinksw.com>
On 11/2/16 11:59 PM, Simon Spero wrote:
>
> 1. Virtuoso is an SQL database designed to support SPARQL & RDF. If
> the underlying dataset has schema that is mostly regular, using tables
> can be a big performance win over the straight triple store. Or it can
> be much worse. Also cold vs. warm caches require care when
> benchmarking (this applies to just about every RDF store).
>
> 2. PubChem is a quite lovely dataset to work with when you only want
> some of it (especially for non bio :)
>
> Simon
>
Simon,
+1 to that :)
Fundamentally, Virtuoso is a demonstration of what's possible with both
SPARQL and SQL using a single high-performance RDBMS. It also leverages
understanding of RDF-Language with regards to critical issues such as
data security and privacy using Attribute-based Access controls.
I guess its time to produce a few posts about how you can extend SQL
(one standard) using SPARQL (another standard) with regards to powerful
data access and integration, without compromising security and privacy
etc..
[1]
http://kidehen.blogspot.com/2015/07/conceptual-data-virtualization-across.html
[2]
https://www.linkedin.com/pulse/dbpedia-201604-edition-kingsley-uyi-idehen
[3]
https://www.linkedin.com/pulse/reasoning-inference-using-british-royal-family-part-idehen
-- covers Custom (rather than in-built) Reasoning & Inference using
SPARQL as Rules Language (*note: this is part of the soon to be released
8.0 Edition of Virtuoso) .
Kingsley
>
> On Nov 2, 2016 11:23 PM, "Bernadette Hyland" <bhyland@3roundstones.com
> <mailto:bhyland@3roundstones.com>> wrote:
>
> Hi Andrew,
> I share this with the caveats that every app has unique
> requirements, and second, we all have a tendency to use
> technologies with which we’re familiar.
>
> In our case, we focus on linked data modeling and app development
> using Callimachus Enterprise.[1] Our team has OpenRDF Sesame
> chops, so we often use that store. Callimachus (OSS or
> Enterprise), is fanatical about RDF/SPARQL 1.1 compliance, and
> that is really the important part IMHO.
>
> Back to your question about slinging larger RDF bulk data -
> Recently, we needed to work with a data download (PubChem RDF
> weighs in at a hefty 99B triples), with a download size of about
> 40GB. The PubChem data stewards recommend that the database needs
> 64GB RAM and 500GB disk.[2]
>
> We thought we might blow the gaskets on OpenRDF Sesame, so we
> opted for Open Link Software's Virtuoso.[3] We installed Virtuoso
> on an AWS large instance to manipulate the 99B triples down to the
> more manageable dataset of around 6B triples (chemical synonyms
> and descriptors), that we needed. Worked very well.
>
> That said, if we need to scale up or our client has a preference
> for a specific triple store, we develop the UI layer using the
> Callimachus Web application server, which speaks to any SPARQL 1.1
> compliant triple store on the server, e.g., MarkLogic, Ontotext
> GraphDB, others.
>
> We’ll typically prototype with OpenRDF Sesame, because we know it
> well, and then scale as required. FWIW, it took one developer < 1
> day to integrate with a MarkLogic and GraphDB — because both of
> these database vendors are good about SPARQL 1.1 compliance.
>
> Note: We have no commercial relationship with any graph database
> company, in fact, we're database agnostic.
>
> In summary, we use Callimachus Enterprise to create applications
> using HTML5/CSS3, building named queries using SPARQL 1.1. For
> some apps, we’ll split up the data onto multiple OpenRDF Sesame
> instances, as required. If the customer wants to use / pay for a
> license to another SPARQL 1.1 compliant persistent store, we’re
> all over it.
>
> Bottom line: If you go with vendors that make good on RDF/SPARQL
> 1.1 standards compliance, you can sling some pretty hefty RDF and
> build nice UIs on top quickly.
>
> Anyone doing Linked Data beyond the prototyping phase is using
> some combination of OSS + commercially licensed products for the
> Web server/UI and persistent store layers.
>
> Hope that helps.
>
> Cheers,
>
> Bernadette Hyland
> bhyland@3roundstones.com <mailto:bhyland@3roundstones.com> ||
> Skype BernHyland
>
> [1] http://callimachusproject.org/ <http://callimachusproject.org/>
>
> [2] https://pubchem.ncbi.nlm.nih.gov/rdf/#table2
> <https://pubchem.ncbi.nlm.nih.gov/rdf/#table2>
>
> [3] https://virtuoso.openlinksw.com/dataspace/doc/(NULL)/wiki/Main/
> <https://virtuoso.openlinksw.com/dataspace/doc/%28NULL%29/wiki/Main/>
>
>
>> On Nov 3, 2016, at 00:38, Andrew Woods <awoods@duraspace.org
>> <mailto:awoods@duraspace.org>> wrote:
>>
>> Hello Bernadette,
>> Would you be willing to share the name of the triplestore
>> implementation you are using to store 99B triples?
>> Thanks,
>> Andrew Woods
>>
>> On Wed, Nov 2, 2016 at 10:24 AM, Bernadette Hyland
>> <bhyland@3roundstones.com <mailto:bhyland@3roundstones.com>> wrote:
>>
>> Hi Ai-jun,
>> Not sure that storing RDF triples in a relational database is
>> novel, at least not in 2016. And 300M isn’t a big number in
>> the world of graph databases. For example, we’re working with
>> a linked data repository, PubChem with 99B triples, and
>> linking it to a subset of environmental linked open data.
>> Point is, graph databases are a useful tool for specific
>> jobs, just like RDBMS’s are great for other jobs.
>>
>> More importantly, getting triples out in a speedy manner,
>> using a standard query language, and building a nice UI, is
>> the part many people in the linked data community have spent
>> 10+ years getting right.
>>
>> Just my 2 cents.
>>
>> Cheers,
>>
>> Bernadette Hyland
>> CEO, 3 Round Stones, Inc.
>>
>>
>>
>>> On Nov 2, 2016, at 04:11, Li, Ai-jun
>>> <Ai-jun.Li@morganstanley.com
>>> <mailto:Ai-jun.Li@morganstanley.com>> wrote:
>>>
>>>
>>> I came across a very old request for comments for storing
>>> RDF data in relational database
>>> (http://infolab.stanford.edu/~melnik/rdf/db.html
>>> <http://infolab.stanford.edu/%7Emelnik/rdf/db.html>). I was
>>> unable to find any newer discussion on this. We had
>>> implemented a very innovative way of storing linked graph
>>> data in Sybase many years ago and the system is still being
>>> used today. The system is storing the equivalent of over 300
>>> million triples and is scalable for much more. We’d be happy
>>> to share our approach if this is something the community is
>>> still interested in (will need to get the firm’s approval,
>>> obviously).
>>>
>>> Thanks,
>>> Ai-jun Li
>>> *Morgan Stanley | Enterprise Infrastructure
>>> *1 New York Plaza, 16th Floor | New York, NY 10004
>>> Phone: +1 646 536-0765 <tel:%2B1%20646%20536-0765>
>>> Ai-jun.Li@morganstanley.com
>>> <mailto:Ai-jun.Li@morganstanley.com>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> NOTICE: Morgan Stanley is not acting as a municipal advisor
>>> and the opinions or views contained herein are not intended
>>> to be, and do not constitute, advice within the meaning of
>>> Section 975 of the Dodd-Frank Wall Street Reform and
>>> Consumer Protection Act. If you have received this
>>> communication in error, please destroy all electronic and
>>> paper copies and notify the sender immediately.
>>> Mistransmission is not intended to waive confidentiality or
>>> privilege. Morgan Stanley reserves the right, to the extent
>>> permitted under applicable law, to monitor electronic
>>> communications. This message is subject to terms available
>>> at the following
>>> link: http://www.morganstanley.com/disclaimers
>>> <http://www.morganstanley.com/disclaimers> If you cannot
>>> access these links, please notify us by reply message and we
>>> will send the contents to you. By communicating with Morgan
>>> Stanley you consent to the foregoing and to the voice
>>> recording of conversations with personnel of Morgan Stanley.
>>
>>
>
--
Regards,
Kingsley Idehen
Founder & CEO
OpenLink Software (Home Page: http://www.openlinksw.com)
Weblogs (Blogs):
Legacy Blog: http://www.openlinksw.com/blog/~kidehen/
Blogspot Blog: http://kidehen.blogspot.com
Medium Blog: https://medium.com/@kidehen
Profile Pages:
Pinterest: https://www.pinterest.com/kidehen/
Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter: https://twitter.com/kidehen
Google+: https://plus.google.com/+KingsleyIdehen/about
LinkedIn: http://www.linkedin.com/in/kidehen
Web Identities (WebID):
Personal: http://kingsley.idehen.net/dataspace/person/kidehen#this
: http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
Attachments
- application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Thursday, 3 November 2016 14:45:33 UTC