- From: Frans Knibbe | Geodan <frans.knibbe@geodan.nl>
- Date: Tue, 10 Sep 2013 21:09:26 +0200
- To: Andy Turner <A.G.D.Turner@leeds.ac.uk>, public-lod community <public-lod@w3.org>
- CC: Leigh Dodds <leigh@ldodds.com>
- Message-ID: <522F6E66.90409@geodan.nl>
On 10-9-2013 13:33, Andy Turner wrote: > > Hi, > > At least these two OGC standards might be worth having a look at in > this context: > > http://www.opengeospatial.org/standards/geosparql > > http://www.opengeospatial.org/standards/tjs > > The latter is a Georeferenced Table Joining Service Implementation > Standard. In the development of this a lot of thought went in to > different kinds of linking of geographical data. Sorry, but I know > very little about the GeoSPARQL standard. > > The notion of keeping geometry data separate and providing metadata > about geometries in standard forms is useful. For vector data, the > number of points in the geometry is one of the key attributes an > application might consider before pulling that geometry. (Also the > size of its representation in bytes -- both compressed and > uncompressed is useful info too -- thanks Leigh.) > > So, for vector data, the attributes for individual vectors (almost > like features) can be kept separate from the spatial geometries, and > some linkage code can be used to join the data together. Yes, there > are advantages in terms of storage organisation for keeping attributes > and geometries separate, but for many applications some attributes of > the geometries are also wanted, this geometrical metadata is important > to think about. Computationally some of it can be hard to calculate, > so once calculated it is perhaps worth storing in optional metadata. > > Individual points with a single attribute, where the point is defined > with respect to axes in some geographical coordinate and projection > system are simple geo-vectors. Lines built from multiple such points > (and equations) are more detailed/complex, yet these can have simply > attributed generalised point representations (the location of a > smallest circle/sphere encompassing all the points in the line -- > perhaps with a measure of the radius of this). There are similar > things for regional polygons in two and three dimensions. > > With lines and points, their geometries can be simplified in other > ways which can result in other lines and polygons. Simplifying > contiguous polygons to maintain topological relationships is not > necessarily straightforward. > > The point I am trying to make with the above is that there are > multiple different geometries, not a single geometry for a real world > object that can be described/defined with RDF. Some of the more > generalised forms of the spatial geometries can be calculated and > stored as metadata in fixed number of field type table > representations. Often so called bounding boxes and bounding circles > are use, as are line lengths, perimeters, surface areas, volumes, > average distances, and ratios of these geometrical attributes. Based > on the geometrical attributes, further attributes can be derived for > other attributes (e.g. density). > > Consider something complex, like a city. This has multiple geometrical > representations. > > Two more things: > > Geohashes (http://en.wikipedia.org/wiki/Geohash) which interleave > coordinates represented by positions on axes using some predetermined > axis order and prescription are useful in the context of linking data > - as they are string representations, that the more truncated they > are, provide less precision for the location of a point, but they > start with the same string sequence. > > The other key dimension to think about in geographical relations is > time. How time relates to all this is important, but this email is > already long, so all I will sate is that a city now could be very > different to a city some years ago (in terms of spatial > dimension/geometry), yet in some ways they are the same place. There > are ways to derive (very) complex geometries of ephemeral events, you > could consider one, like the Olympic games. > > HTH and sorry for the long post. > Hello Andy, Thank you for the long post and for sharing your thoughts. Yes, I agree that any real world object can have many different geometries, depending on coordinate reference system, level of detail, time, method of measurement and whatnot. But I don't think that is a problem. Linked Data is very capable of sharing different perspectives of a single real world phenomenon, and also of annotating those different perspectives to help with correctly interpreting them. The problem that I see is how to handle those cases where geometry literals become unwieldy. The GeoSparql specification that you mention provides a way of writing a geometry as a literal in RDF. There may be several approaches as to how to serialize a geometry, but ending up with series of coordinates is inescapable. And I am worried about the impact of these series of coordinates becoming very long. That is why I also do like the idea of providing some extra data to enable a client to distinguish between large and small geometries. The small ones could be downloaded and processed right away, but the bigger ones might need some extra care. Thinking about this, I wonder if the idea of a general compression function for literals has ever been considered for SPARQL. That would enable a query like SELECT ?name, ?population, COMPRESS(?geometry) FROM <http://example.org/cities> Such a function could be used only for those literals whose size exceeds a certain threshold. And it would be applicable to all kinds of big literals. About Geohash: Yes, it is a kind of compression for geometry, but as far as I can tell it only applies to single points. Regards, Frans > Andy > http://www.geog.leeds.ac.uk/people/a.turner/ > > *From:*Frans Knibbe | Geodan [mailto:frans.knibbe@geodan.nl] > *Sent:* 10 September 2013 11:11 > *To:* Leigh Dodds > *Cc:* public-lod community > *Subject:* Re: Minimizing data volume > > On 9-9-2013 16:48, Leigh Dodds wrote: > > Hi, > > > > Before using compression you might also make a decision about whether > > you need to represent all of this information as RDF in the first > > place. > > > > For example, rather than include the large geometries as literals, why > > not store them as separate documents and let clients fetch the > > geometries when needed, rather than as part of a SPARQL query? > > > > Geometries can be served using standard HTTP compression techniques > > and will benefit from caching. > > > > You can provide summary statistics (including size of the document, > > and properties of the described area, e.g. centroids) in the RDF to > > help address a few common requirements, allowing clients to only fetch > > the geometries they need, as they need them. > > > > This can greatly reduce the volume of data you have to store and > > provides clients with more flexibility. > > > > Cheers, > > > > L. > > Yes, that is something to consider. Thanks for broadening my mind! I > think such an approach may be suited for certain kinds of high volume > data, like images or video. But I do have some doubts about its > effectiveness for geographical data: > > 1) In geographical data sets geometries typically have different > sizes. Some may be very big, others may be reasonably small. So where > to draw the limit? > > 2) When using SPARQL and RDF it is already possible to provide summary > statistics and leave it to the client to fetch the geometries if > needed. However, it is not standard practice to provide summaries like > centroid, bounding box or coordinate count for each geometry, but > perhaps it should be. > > 3) On the surface, this approach seems to add complexity to data > retrieval, for both clients and servers. Instead of one way of > publishing and getting data, there will be two ways. > > 4) Having to fetch geometries one at a time, instead of processing > them all from one data set, could complicate matters and also > introduce some loss of performance. I can imagine this method working > well for things like images, videos or files, because they are > typically used one at a time. But in many cases geometries should be > available all at once, to draw on a map for instance. > > 5) I think most geometries are stored as attribute data in relational > databases. Preprocessing them to make them available as separate files > can be done offline. But in other cases the geometries are transient, > they could be generated by a function in a query. The method should > work with performance gains in those cases too. > > > Regards, > Frans > > > > > > > > On Mon, Sep 9, 2013 at 10:47 AM, Frans Knibbe | Geodan > > <frans.knibbe@geodan.nl> <mailto:frans.knibbe@geodan.nl> wrote: > > Hello, > > > > In my line of work (geographical information) I often deal with high volume > > data. The high volume is caused by single facts having a big size. A single > > 2D or 3D geometry is often encoded as a single text string and can consist > > of thousands of numbers (coordinates). It is easy to see that this can cause > > performance issues with transferring and processing data. So I wonder about > > the state of the art in minimizing data volume in Linked Data. I know that > > careful publication of data will help a bit: multiple levels of detail could > > be published, coordinates could use significant digits (they almost never > > do), but it seems to me that some kind of compression is needed too. Is > > there something like a common approach to data compression at the moment? > > Something that is understood by both publishers and consumers of data? > > > > Regards, > > Frans > > > > > > > > > > > > -- > -------------------------------------- > *Geodan* > President Kennedylaan 1 > 1079 MB Amsterdam (NL) > > T +31 (0)20 - 5711 347 > E frans.knibbe@geodan.nl <mailto:frans.knibbe@geodan.nl> > www.geodan.nl <http://www.geodan.nl> | disclaimer > <http://www.geodan.nl/disclaimer> > -------------------------------------- > -- -------------------------------------- *Geodan* President Kennedylaan 1 1079 MB Amsterdam (NL) T +31 (0)20 - 5711 347 E frans.knibbe@geodan.nl www.geodan.nl <http://www.geodan.nl> | disclaimer <http://www.geodan.nl/disclaimer> --------------------------------------
Received on Tuesday, 10 September 2013 19:10:06 UTC