Re: PG mode and SA mode from Steve sarsfield on 2019-09-23 (public-rdf-star@w3.org from September 2019)

From: Steve sarsfield <steve.sarsfield@cambridgesemantics.com>
Date: Mon, 23 Sep 2019 13:30:23 -0400
To: public-rdf-star@w3.org
Message-ID: <CAL3k4tXyTJ=sHWg72XGnJQ1vo_OA3JmzLKgBab_6HqRw1A=U0Q@mail.gmail.com>

>>Could you expand a bit on this? What is it about reification that creates
these performance issues?
>>Is it something that is inherent to the design of RDF Reification, or is
it something about the way it is generally implemented?

Distributed RDF graph databases shard their data such that all of the
triples associated with a given subject reside on the same node. In LPG
terms, this means that "whole vertexes" are stored on nodes. This is done
to minimize network communications during typical query processing because
network interconnects are nearly two orders of magnitude slower than main
memory, let alone cached memory. The low network throughput/latency, as
compared to local memory, is a primary driver in cost-based query planners,
they target minimal traffic over the interconnect and perform
traversal/join operations that minimize such traffic.

Analytic queries that reference multiple properties of that same subject
("same-subject joins") are thus optimized and only subject/object (or
object/object) joins/traversals move significant data over the
interconnect. With reified data, these same-subject joins become
subject/object (and/or object/object) network intensive. Reification uses
more triples to model the same information as a single RDF* triple.  This
has storage implications and performance implications -- requiring extra
JOINs to get at the information.

Thanks,

Steve

Received on Monday, 23 September 2019 17:30:58 UTC