- From: Steve sarsfield <steve.sarsfield@cambridgesemantics.com>
- Date: Mon, 23 Sep 2019 13:30:23 -0400
- To: public-rdf-star@w3.org
- Message-ID: <CAL3k4tXyTJ=sHWg72XGnJQ1vo_OA3JmzLKgBab_6HqRw1A=U0Q@mail.gmail.com>
>>Could you expand a bit on this? What is it about reification that creates these performance issues? >>Is it something that is inherent to the design of RDF Reification, or is it something about the way it is generally implemented? Distributed RDF graph databases shard their data such that all of the triples associated with a given subject reside on the same node. In LPG terms, this means that "whole vertexes" are stored on nodes. This is done to minimize network communications during typical query processing because network interconnects are nearly two orders of magnitude slower than main memory, let alone cached memory. The low network throughput/latency, as compared to local memory, is a primary driver in cost-based query planners, they target minimal traffic over the interconnect and perform traversal/join operations that minimize such traffic. Analytic queries that reference multiple properties of that same subject ("same-subject joins") are thus optimized and only subject/object (or object/object) joins/traversals move significant data over the interconnect. With reified data, these same-subject joins become subject/object (and/or object/object) network intensive. Reification uses more triples to model the same information as a single RDF* triple. This has storage implications and performance implications -- requiring extra JOINs to get at the information. Thanks, Steve
Received on Monday, 23 September 2019 17:30:58 UTC