Re: Minimizing data volume from Leigh Dodds on 2013-09-09 (public-lod@w3.org from September 2013)

From: Leigh Dodds <leigh@ldodds.com>
Date: Mon, 9 Sep 2013 15:48:34 +0100
To: "Frans Knibbe | Geodan" <frans.knibbe@geodan.nl>
Cc: public-lod community <public-lod@w3.org>
Message-ID: <CAC_nr_qBi5qVmos0C6yofB19ftDs=bm1vMsKV+KbuCYj2Bg37g@mail.gmail.com>

Hi,

Before using compression you might also make a decision about whether
you need to represent all of this information as RDF in the first
place.

For example, rather than include the large geometries as literals, why
not store them as separate documents and let clients fetch the
geometries when needed, rather than as part of a SPARQL query?

Geometries can be served using standard HTTP compression techniques
and will benefit from caching.

You can provide summary statistics (including size of the document,
and properties of the described area, e.g. centroids) in the RDF to
help address a few common requirements, allowing clients to only fetch
the geometries they need, as they need them.

This can greatly reduce the volume of data you have to store and
provides clients with more flexibility.

Cheers,

L.

On Mon, Sep 9, 2013 at 10:47 AM, Frans Knibbe | Geodan
<frans.knibbe@geodan.nl> wrote:
> Hello,
>
> In my line of work (geographical information) I often deal with high volume
> data. The high volume is caused by single facts having a big size. A single
> 2D or 3D geometry is often encoded as a single text string and can consist
> of thousands of numbers (coordinates). It is easy to see that this can cause
> performance issues with transferring and processing data. So I wonder about
> the state of the art in minimizing data volume in Linked Data. I know that
> careful publication of data will help a bit: multiple levels of detail could
> be published, coordinates could use significant digits (they almost never
> do), but it seems to me that some kind of compression is needed too. Is
> there something like a common approach to data compression at the moment?
> Something that is understood by both publishers and consumers of data?
>
> Regards,
> Frans
>
>

-- 
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: leigh@ldodds.com

Received on Monday, 9 September 2013 14:49:02 UTC