RE: Bandwidth efficiency

> -----Original Message-----
> From: public-rdf-dawg-request@w3.org
> [mailto:public-rdf-dawg-request@w3.org]On Behalf Of Steve Harris
> Sent: Tuesday, June 01, 2004 11:05 AM
> To: DAWG public list
> Subject: Bandwidth efficiency
>
> I'm mail something about latency tommorow, but here's some thoghts about
> bandwidth:
>
> Some test data (http://example.com/foo#):
>
>   :alice rdf:type   a:Person .
>   :alice a:hasName  "Alice Foo" .
>   :alice a:worksFor :deptA .
>   :bob   rdf:type   a:Person .
>   :bob   a:hasName  "Bob Bar" .
>   :bob   a:WorksFor :deptA .
>   :carol rdf:type   a:Person .
>   :carol a:hasName  "Carol Qux" .
>   :carol a:WorksFor :deptB .
>   :deptA a:hasName  "Department A" .
>   :deptB a:hasName  "Department B" .
>
> suppose we wish to know the URIs of everyone who works in "Department A"
> we can ask something like:
>
>   SELECT ?person
>   FROM <http://example.com/foo#>
>   WHERE (?person, a:WorksFor, ?dept),
>         (?dept, a:hasName, "Department A")
>
> The minimum returned information to satisfy this query is:
>
>   http://example.com/foo#alice
>   http://example.com/foo#bob
>
> Different result encodings will have different levels of efficiency, eg.
> subgraph result returns in NTriples would be:
>
>   http://example.com/foo#alice http://example.com/bar#worksFor
> http://example.com/foo#deptA .
>   http://example.com/foo#bob http://example.com/bar#worksFor
> http://example.com/foo#deptA .
>   http://example.com/foo#deptA http://example.com/bar#hasName
> "Department A" .

This slight digression is inspired by the topic at hand. What about the idea
of prepending some sort of preamble to the query result to provide meta
information on the result set, as well as the general environment and (for
example) user-selectable settings of interest? In this case it's primarily
to reduce bandwidth (stealing the prefix-namespace mechanism generally used
on triples input to make the output both more compact and human-readable),
but I'm also throwing in a few other hypothetical preamble "infoitems" that
connect into several of our other uc requirement items as well:

      dawg-ql:resultPreamble
      {
            dawg-ql:prefix  "ex"
            dawg-ql:namespace  "http://example.com/foo#"
            dawg-ql:resultFormat  dawg-ql:compactTriples
            dawg-ql:maxChunkSize  2048
            dawg-ql:numTriples  3
            dawg-ql:numInferredNodes  0
      }
      ex:alice ex:worksFor ex:deptA
      ex:bob ex:worksFor ex:deptA
      ex:deptA ex:hasName "DepartmentA"

The particular preamble format here, whether by accident or design, looks a
lot like RDF but doesn't necessarily have to. I'm just trying out a concept.

Howard


> Which is roughly 4.5 times more data than the raw data needed to answer
> the query. Equally an overly verbose result format for the bindings can
> hurt:
>
>   <table>
>     <row>
>       <column name="person"
> type="uri">http://example.com/foo#alice</column>
>     </row>
>     <row>
>       <column name="person" type="uri">http://example.com/foo#bob</column>
>     </row>
>   </table>
>
> Which is the XML format we use in 3store (designed to be easily disected
> by DOM methods) it's not good (about 3.2x the raw data in this case), but
> its easy to imagine things that are worse still.
>
> Conversely our tab delimited ASCII format is only 1.2x bigger, but not
> very web friendly:
>
>   ?person
>   <http://example.com/foo#alice>
>   <http://example.com/foo#bob>
>
> I have found bandwidth to be important when using servers in the US from
> Europe and vice versa - often the bottleneck in getting results to the
> client (using the XML format) is the bandwidth, not the processing time in
> the server. It's also important for devices with limited bandwidth (eg.
> hand-held or whatever).
>
> - Steve
>

Received on Tuesday, 1 June 2004 20:01:07 UTC