Bandwidth efficiency

I'm mail something about latency tommorow, but here's some thoghts about
bandwidth:

Some test data (http://example.com/foo#):

  :alice rdf:type   a:Person .
  :alice a:hasName  "Alice Foo" .
  :alice a:worksFor :deptA .
  :bob   rdf:type   a:Person .
  :bob   a:hasName  "Bob Bar" .
  :bob   a:WorksFor :deptA .
  :carol rdf:type   a:Person .
  :carol a:hasName  "Carol Qux" .
  :carol a:WorksFor :deptB .
  :deptA a:hasName  "Department A" .
  :deptB a:hasName  "Department B" .

suppose we wish to know the URIs of everyone who works in "Department A"
we can ask something like:

  SELECT ?person
  FROM <http://example.com/foo#>
  WHERE (?person, a:WorksFor, ?dept),
        (?dept, a:hasName, "Department A")

The minimum returned information to satisfy this query is:

  http://example.com/foo#alice
  http://example.com/foo#bob

Different result encodings will have different levels of efficiency, eg.
subgraph result returns in NTriples would be:

  http://example.com/foo#alice http://example.com/bar#worksFor http://example.com/foo#deptA .
  http://example.com/foo#bob http://example.com/bar#worksFor http://example.com/foo#deptA .
  http://example.com/foo#deptA http://example.com/bar#hasName "Department A" .

Which is roughly 4.5 times more data than the raw data needed to answer
the query. Equally an overly verbose result format for the bindings can
hurt:

  <table>
    <row>
      <column name="person" type="uri">http://example.com/foo#alice</column>
    </row>
    <row>
      <column name="person" type="uri">http://example.com/foo#bob</column>
    </row>
  </table>

Which is the XML format we use in 3store (designed to be easily disected
by DOM methods) it's not good (about 3.2x the raw data in this case), but
its easy to imagine things that are worse still.

Conversely our tab delimited ASCII format is only 1.2x bigger, but not
very web friendly:

  ?person
  <http://example.com/foo#alice>
  <http://example.com/foo#bob>

I have found bandwidth to be important when using servers in the US from
Europe and vice versa - often the bottleneck in getting results to the
client (using the XML format) is the bandwidth, not the processing time in
the server. It's also important for devices with limited bandwidth (eg.
hand-held or whatever).

- Steve

Received on Tuesday, 1 June 2004 14:05:22 UTC