Re: Inclusion of non-geometric ways to describe location (e.g. address and geocode) in BP10? from Joshua Lieberman on 2017-03-11 (public-sdw-wg@w3.org from March 2017)

From: Joshua Lieberman <jlieberman@tumblingwalls.com>
Date: Sat, 11 Mar 2017 10:24:58 -0500
To: Jeremy Tandy <jeremy.tandy@gmail.com>
Cc: Bill Roberts <bill@swirrl.com>, Andrea Perego <andrea.perego@ec.europa.eu>, Linda van den Brink <l.vandenbrink@geonovum.nl>, SDW WG Public List <public-sdw-wg@w3.org>
Message-Id: <56BEB84F-364F-4E68-BC4E-41709709986C@tumblingwalls.com>
“Flat” is not simple per se. If it is used to represent naturally hierarchical or graph-structured information it often becomes a mess. Not clear what this has to do with Elastic Search either, which is a full text search engine. Preference for “flat” usually has to do with using relational databases or tools supporting relational structures where anything other than uniform record lists becomes unwieldy. The Simple Features Profile has more to do with mitigating the flexibility of GML for client developers where that flexibility isn't really needed. It’s fine, I think, to point out anywhere in the BP doc that there is always a trade-off between choice and interoperability, but data structures should reflect what is being modeled. As long as some tools are available to work with them, e.g. graph stores, this will generally be less complicated to work with. It may also be an issue that more is being modeled than is needed for a particular application, e.g. single positions should not be the only locator considered, but are frequently good enough for many applications.

Interesting question about addresses, because they are frequently the only locators available in datasets, but they are not reliable spatial positions. Geocoding ends up being an art that frequently sends people to the wrong place, and an even less exact art for addresses in unfamiliar systems, e.g. found on the Web. This may be another case where there is a practice (locating data records with addresses) that isn’t really a best practice. Wherever possible, data producers should do the geocoding themselves and get it right, rather than leaving others to rely on various geocoding services with uncertain positional reliability and restrictive terms of use.

—Josh

Joshua Lieberman
Principal, Tumbling Walls
jlieberman*tumblingwalls.com
+1 617 431 6431

> On Mar 11, 2017, at 5:24 AM, Jeremy Tandy <jeremy.tandy@gmail.com> wrote:
> 
> Hah! Perhaps just identify that simple, flat structures are easier for users to work with, so only add complexity where you need it ... and reference GML Simple Features Profile as an example of how "complexity" can be managed?
> 
> On Sat, 11 Mar 2017 at 10:21 Bill Roberts <bill@swirrl.com <mailto:bill@swirrl.com>> wrote:
> interesting - though I think that's going to be too detailed to get into in the BP - unless you want BP10 to be 20 pages long!
> 
> 
> 
> On 11 March 2017 at 10:05, Jeremy Tandy <jeremy.tandy@gmail.com <mailto:jeremy.tandy@gmail.com>> wrote:
> Hi Bill - just one more thing (again!) ...
> 
> I was talking to a colleague of mine earlier this week about how he's publishing spatial data on the Web; making use of GeoJSON, elastic-search, open layers etc. All good "modern" webby stuff. One of the bits of advice he gave was:
> 
> "keep your data structures FLAT (avoid nesting/embedded objects; as per OGC GML Simple Features Profile) - this makes it easier for users to work with in existing tools (e.g. ElasticSearch)"
> 
> He refers to the structures in GeoJSON [1] "properties" object (see 3.2 Feature Object [2]) and (I would assume) any "foreign members" [3]. This makes it easier to import the GeoJSON documents into elastic search etc. (I think that's what he said)
> 
> The OGC's GML Simple Features Profile [4] defines three levels of compliance: SF-0, SF-1 and SF-2 - each of which become progressively less restrictive profiles from 0 to 2. Above 2 you're using everything that GML has; kitchen sink and all! I wonder if these notions of profiling for interoperability might be a useful inclusion in BP10? section "2.1 Introduction" provides a good starting point (but then I suppose that's the point).
> 
> Jeremy
> 
> [1]: https://tools.ietf.org/html/rfc7946 <https://tools.ietf.org/html/rfc7946>
> [2]: https://tools.ietf.org/html/rfc7946#section-3.2 <https://tools.ietf.org/html/rfc7946#section-3.2> 
> [3]: https://tools.ietf.org/html/rfc7946#section-6.1 <https://tools.ietf.org/html/rfc7946#section-6.1> 
> [4]: http://portal.opengeospatial.org/files/?artifact_id=42729 <http://portal.opengeospatial.org/files/?artifact_id=42729>
> On Sat, 11 Mar 2017 at 09:29 Jeremy Tandy <jeremy.tandy@gmail.com <mailto:jeremy.tandy@gmail.com>> wrote:
> Thanks Bill.
> 
> On Sat, 11 Mar 2017 at 09:18 Bill Roberts <bill@swirrl.com <mailto:bill@swirrl.com>> wrote:
> Hi Jeremy
> 
> Good idea - I think it would be good to include something about addresses and geocodes as a way of encoding location.  I'll try to incorporate something on that.
> 
> 
> 
> On 11 March 2017 at 09:08, Jeremy Tandy <jeremy.tandy@gmail.com <mailto:jeremy.tandy@gmail.com>> wrote:
> Hi Bill.
> 
> Given that Andrea is talking about _geometries_ in BP8, we seem to have a gap with regard to _other_ mechanisms to describe location; e.g. addresses and geocodes (postal codes etc., geohashes [1] and, I think worth mentioning explicitly, W3W [2]). 
> 
> In you discussion of “how to encode spatial data” I think it is worth calling these mechanisms out specifically, and referring to Andrea’s work on geometries in BP8.
> 
> Given Andrea's involvement with the ISA Programme Location Core Vocabulary [3] (which defines locn:Address), he may have some useful contributions here too.
> 
> Addresses are mentioned in the following use cases:
> 4.5 Harvesting of Local Search Content
> 4.9 Enabling publication, discovery and analysis of spatiotemporal data in the humanities
> 4.13 Publication of air quality data aggregations
> 
> Strangely, we don’t have any requirements that mention addresses.
> 
> I’m also reminded of the Discrete Global Grid System (DGGS) standard being prepared by OGC [4] which will … For example, HEALPix (“Hierarchical Equal Area isoLatitude Pixelization”) grids, an indexing system used for DGGS, are useful for EO data because each cell is uniquely identified and has equal-area (at that level in the grid) so that you don’t need to re-sample when comparing cell properties; the value of each cell is directly comparable. DGGS and HEALPix are (were?) referenced in the EO-QB work of our group.
> 
> That said, I don’t think the DGGS is formally approved as a standard, so it may only warrant a note - or no mention at all. I doubt it meets our criteria for “best practice in the wild”. It also looks a little complex from my quick scan of the OGC doc. 
> 
> There are also clearly a large number of other coding systems for geographical and administrative areas & places. I’ll try to cover referring to these types of things in BP14 concerning linking.
> 
> Given the short amount of time available before our intended “freeze” (on Wed 15-Mar) of the BP doc for next WD release, I’d be content to push these changes into the work plan for the next sprint.
> 
> Jeremy
> 
> 
> [1]: https://en.wikipedia.org/wiki/Geohash <https://en.wikipedia.org/wiki/Geohash> 
> [2]: http://what3words.com <http://what3words.com/> 
> [3]: https://www.w3.org/ns/locn# <https://www.w3.org/ns/locn#>
> [4]: public draft: OGC #15-104r3 https://portal.opengeospatial.org/files/66643 <https://portal.opengeospatial.org/files/66643> 
> 
> 
> 
>
Received on Saturday, 11 March 2017 15:25:57 UTC