RE: Bandwidth efficiency from Seaborne, Andy on 2004-06-02 (public-rdf-dawg@w3.org from April to June 2004)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Wed, 2 Jun 2004 14:05:26 +0100
To: Steve Harris <S.W.Harris@ecs.soton.ac.uk>, DAWG public list <public-rdf-dawg@w3.org>
Message-ID: <E864E95CB35C1C46B72FEA0626A2E808031A9C71@0-mail-br1.hpl.hp.com>
-------- Original Message --------
> From: Steve Harris <>
> Date: 2 June 2004 12:26
> 
> On Tue, Jun 01, 2004 at 05:00:38 -0700, Howard Katz wrote:
> > This slight digression is inspired by the topic at hand. What about
> > the idea of prepending some sort of preamble to the query result to
> > provide meta information on the result set, as well as the general
> > environment and (for example) user-selectable settings of interest?
> > In this case it's primarily to reduce bandwidth (stealing the
> > prefix-namespace mechanism generally used on triples input to make
> > the output both more compact and human-readable), but I'm also
> > throwing in a few other hypothetical preamble "infoitems" that
> > connect into several of our other uc requirement items as well: 
> > 
> >       dawg-ql:resultPreamble
> >       {
> >             dawg-ql:prefix  "ex"
> >             dawg-ql:namespace  "http://example.com/foo#"
> >             dawg-ql:resultFormat  dawg-ql:compactTriples
> >             dawg-ql:maxChunkSize  2048
> >             dawg-ql:numTriples  3
> >             dawg-ql:numInferredNodes  0
> >       }
> >       ex:alice ex:worksFor ex:deptA
> >       ex:bob ex:worksFor ex:deptA
> >       ex:deptA ex:hasName "DepartmentA"
> > 
> > The particular preamble format here, whether by accident or design,
> > looks a lot like RDF but doesn't necessarily have to. I'm just trying
> > out a concept. 
> 
> Also a result format in N3 or RDF/XML could do this using thier prefix
> mechanisms. There is a downside however, if the server is expected to
> strip out the namespaces for the URIs then it must store the result set
> internally and post process it to ensure that its got all the namespaces
> before it starts to send the results (as the namespaces aredeclared at
> the 
> top). This increses latency.

True - but its not that bad: we store namespace prefixes with models and
could use thoese prefixes.  Works very well with predicates.  Going another
stage, allowing @prefix declarations inline means that the first use
occurrence can be used to declare a prefix.  The client and server need to
maintain a prefix map but that does not seem too bad.

However, the killer is the RDF-ness - information for the first result could
be in the last triple.  We could have a restricted syntax this is parsable
as RDF in an existing syntax, but has further restrictions so that it could
be parsed in a streaming mode.

Example:

@prefix rs:     <http://jena.hpl.hp.com/2003/03/result-set#> .

<>  rs:size 4 ;
    rs:resultVariable "x" ; rs:resultVariable "y" .

@prefix ex: <http://example.com>

<>
    rs:solution
        [ rs:binding [ rs:variable "x" ; rs:value  123 ] ;
          rs:binding [ rs:variable "y" ; rs:value ex:resource1 ]
        ] ;

    rs:solution
        [ rs:binding [ rs:variable "x" ; rs:value "2003-01-21" ] ;
          rs:binding [ rs:variable "y" ; rs:value ex:resource2 ]
        ] ;
... next 2 solutions ...

Not optimial, and further reduction is possible, but this is streamable
under the restriction that all the triples for solution 1 come before
solution 2.  Getting the time-to-first-result down is important.

The WG could do something here.  The question is now whether it has
suffiicent value over a custom designed format (as one of several possible)
that is specially tuned for the usage of minimising latency and bandwidth.

[In terms getting the most out of the least bytes, using a progress lossless
compression scheme is in a similar vein - it is finding duplictae symbols in
the steram and reducing them to short tokens.]

	Andy

> 
> The situation where the namespaces are declared after the results is not
> much better as the client cant interpret them untill its parsed all the
> namespace decls anyway.

Yes - that case really is bad!

> 
> - Steve
Received on Wednesday, 2 June 2004 09:05:50 UTC