- From: Thomas Passin <tpassin@tompassin.net>
- Date: Tue, 4 Dec 2018 20:58:16 -0500
- Cc: semantic-web@w3.org
When I think of modeling addresses, after reading some of these posts and links (and not having had to do this for a living), I would say the simplest model would be this, which seems pretty close to what Joshua said: An address can be represented by one or more representations; denotes a (physical) location # maybe one or more? may have one or more textual aliases. A representation # has one specific syntactical form may have a grammar specification # e.g. B-N. A "representation" of an address is one of the many textual forms that one finds in the wild. You need some non-rdf processing to relate each type of representation you need to handle to its location. It might turn out that each type of address could be expressed as a grammar (in B-N or some other notation) or at least by some syntax rules. If so, that notation type could be included as a property of the address instance. For some other time: fuzzy addresses like "on 5th Ave. between 72nd and 73rd streets". TomP On 12/4/2018 8:08 PM, Joshua Shinavier wrote: > Just to add another data point to the "addresses are hard" thread, at > Uber we have also invested quite some time into standardizing vocabulary > around addresses. Prior to standardization, there were many dozens of > address types in use within the company (and still are), most of which > are of the basic street/city/state/country/zip kind, similar to > schema.org <http://schema.org>'s PostalAddress. After a great deal of > discussion, we opted not to support such a format as a standard. Most of > the reasons for this boil down to items on the page Thomas linked. > Instead, we distinguish between structured addresses (a bag of > components which validate against any of a number of black-box address > schemas) and addresses for display. Google makes a similar distinction > in its Places API.. Address validation, formatting, normalization, etc. > are API concerns that go well beyond the vocabulary itself, requiring > significant background knowledge. I would not be optimistic about > finding canonical identifiers for addresses, though geocoded lat/lon is > probably the next best thing. > > Josh > > > On Tue, Dec 4, 2018 at 4:16 PM Dave Reynolds <dave.e.reynolds@gmail.com > <mailto:dave.e.reynolds@gmail.com>> wrote: > > Hi Hugh, > > On 04/12/2018 22:48, Hugh Glaser wrote: > > Thanks Dave. > > Yes, I agree with all the detail. > > > > My interpretation is that you are confirming what I was saying - > that the general case is a nightmare. > > On that we are agreed :) > > > This is a problem of trying for a standard for the addresses - > not only is it fiendishly complicated, but no standard will ever > satisfy all the reasons you might want to identify something, such > as an address. > > I agree, which is why I was negative about trying to capture it > centrally. > > On the other hand, SW people *are* representing addresses all the > time, using sufficient specificity for their purposes. > > And others will be doing the same thing to the same level. > > Sure, *representing* addresses is just fine. It's *identifying* > addresses that's hard. > > > And businesses in the UK find that the number/postcode pair is > pretty much all they need to deliver almost all online purchases. > > If you are only dealing with consumers, not other businesses, and > mostly > focus on houses in urban areas, and don't care about secondary > addresses > (saons - like flat number, unit number, floor etc), and if you only > care > about delivery (so there's a human at the other end interpreting the > address) and if we can agree to differ on the semantics of "almost > all" > then that's possibly true. > > However, many businesses, even under those constraints, solve it by > getting a human (the one placing an order) to do the matching. You use > number/postcode to constrain and order the search on your (very > expensive) master address list and get the user to pick the right one > from the result list. *Then* you have an identifier. > > > It seems to me that you are concerned with the "global" solution - > > No, simply pointing out that matching real world entities is hard for > domain specific reasons and no amount of RDF/OWL makes much difference > to that. > > Actually, all I was really doing was sharing painfully gathered > experience that in the UK, postcode + number is far from a nearly > unique > key for all addresses. Trust me on this. I've sacrificed a large > part of > the last three months to learning this lesson in great detail :( > > > I want to worry about a more local problem, and what small steps > can be taken to help people in common cases, so that SW & LD are > more useful for developers. > > I've lost track of how this thread about thing equality relates to the > goal of making SW/LD/RDF easier. Which is why I opened with "I don't > want to get embroiled in the main thread(s)" and just commented on the > nature of addresses. > > [While URIs can be off putting I don't think they are *that* much of a > problem for developers. Even where they are a barrier it's the > choice of > namespace that's the challenge ("you mean we have to host a DNS domain > and maintain it?"). In my experience most developers are very happy > with > the notion that some domains have "natural" composite keys that you can > use to identify things and some domains you have to do work to create > some (often human) process to manage your reference identifiers and > then > use those as keys. Once you have your keys, one way or another, then > creating identifiers by combining some sort of namespace with an > encoding/hash of the composite keys is bread and butter stuff, even > outside of SW/LD.] > > Dave > > > > Or are you saying that because specifying addresses as well as > you would like is so hard, we shouldn't bother trying to do > something simpler and useful for many purposes? > > > It is about URIs, and they aren't in the noise - they are the > things that people currently generate for themselves, and get little > or no help with that generation, or linking up. > > > >> On 4 Dec 2018, at 11:24, Dave Reynolds > <dave.e.reynolds@gmail.com <mailto:dave.e.reynolds@gmail.com>> wrote: > >> > >> I don't want to get embroiled in the main thread(s) but, just in > case anyone is *really* dealing with UK addresses rather than using > them as rhetorical examples, then ... > >> > >> On 03/12/2018 23:37, Anthony Moretti wrote: > >>> I see your point Hugh, especially in your case because for UK > addresses consisting of only house number and postcode structural > equality is sufficient for address equality. Decentralized will work > very well in that case. > >> > >> Sadly that's a long way from being true. UK addresses within a > postcode my be identified by house name, house name + number, > business name (with no house name or number at all), any of those > plus a secondary address etc etc. Even when there's a house "number" > sometimes its actually a number range not a single number and > there's considerable ambiguity on how those ranges are expressed and > what the "definitive" range for a given property really is. > >> > >> Identity of UK addresses is simply not something you can express > in OWL or any logic close to it. You need an address reconciliation > algorithm to map your address to an maintained identifier set such > as a UPRN or UDPRN. The reconciliation process will have error rates > that you will need to manage and recover from, there's no closed, > guaranteed algorithm. > >> > >> Once you have the UPRN or UDPRN or whatever you can create URI's > or some inverse functional property as you wish. Except that even > then the official identifier schemes like that aren't perfect and > have ... oddities ... in them that can still mess you up. > >> > >> Generating unique keys for resources based on hashing a few > properties is all very well in simple cases but, at least in my > experience, real world problems are nothing like that simple clean. > You need serious effort to create and maintain identifier schemes > and to reconcile source data against those schemes. Details like > URIs or bNodes seem to me rather down in the noise. > >> > >> Dave > >> > >>> On Mon, Dec 3, 2018 at 3:07 PM Nathan Rixham <nathan@webr3.org > <mailto:nathan@webr3.org> <mailto:nathan@webr3.org > <mailto:nathan@webr3.org>>> wrote: > >>> Hugh, do you mean something like bnode.id <http://bnode.id> > <http://bnode.id> = > >>> sha256(serialise(bnode)) > >>> On Mon, 3 Dec 2018, 22:58 Hugh Glaser <hugh@glasers.org > <mailto:hugh@glasers.org> > >>> <mailto:hugh@glasers.org <mailto:hugh@glasers.org>> wrote: > >>> This is not directly about blank nodes, but is a reply to a > >>> message in the thread. > >>> I’m certainly agreeing that we should work towards common > >>> understanding of Thing equality. > >>> And addresses are a great place to start. > >>> In order for equality to be defined, I think that means you > >>> first need an idea of what an unambiguous address looks > like. > >>> Having an oracle that defines what an unambiguous Thing > looks > >>> like is one organisational structure, and it would be > great if > >>> schema.org <http://schema.org> <http://schema.org> could lead > the way. > >>> It particularly helps people who just want an off the shelf > >>> solution, especially if they have no knowledge of the > Thing domain. > >>> However I (and perhaps David Booth) am after something more > >>> anarchic, that can function in a decentralised way (if > I dare to > >>> use that term! :-) ) > >>> For example, I might decide that I think that House > Number and > >>> PostCode is enough. > >>> (UK people will know that this is a commonly-used way of > >>> choosing an address, although it may well not be > satisfactory > >>> for some purposes, I’m sure.) > >>> That may well be sufficient for me to interwork with > datasets > >>> from Companies House, the Land Registry and a bunch of > other > >>> UK-based organisations, plus many other datasets. > >>> Having a simple standard way to create keys for such things > >>> facilitates that, without any standardisation process > and all > >>> that entails in weaknesses and strengths of trying to get > >>> agreement on what an unambiguous address might look > like on a > >>> world scale for all purposes. > >>> Just generating a URI, without needing to make any > service calls > >>> (having found where they are and chosen the one you > want and > >>> compromised on it, etc.) or anything seems to me a way > of making > >>> all the interlinking so much more accessible for us all. > >>> It is even future proof:- using such a URI means that > if it is > >>> about something new (UK postcodes change all the time > :-(, and > >>> there are more dead ones than live ones), the oracle > doesn’t > >>> tell me anything it didn’t have until I ask again. > >>> In a key-generating world, my new shiny key will slowly > align > >>> with all the other key URIs as they get created. > >>> So yeah, all strength to anyone who wants to take on > the central > >>> roles, but not at the expense of killing the anarchic > solution, > >>> please. > >>> Cheers > >>> > On 3 Dec 2018, at 22:10, Anthony Moretti > >>> <anthony.moretti@gmail.com > <mailto:anthony..moretti@gmail.com> > <mailto:anthony.moretti@gmail.com <mailto:anthony.moretti@gmail.com>>> > >>> wrote: > >>> > > >>> > Cheers for agreeing William. On the topic of > incomplete blank > >>> nodes Henry I'd give them another type, the partial address > >>> example you give I'd give the type AddressComponent, or > >>> something to that effect. I could be wrong, but it's > not a valid > >>> Address if it's a blank node and no other information > in the > >>> graph completes it. > >>> > > >>> > Anthony > >>> > > >>> > On Mon, Dec 3, 2018 at 1:56 PM William Waites > >>> <wwaites@tardis.ed.ac.uk > <mailto:wwaites@tardis.ed.ac.uk> <mailto:wwaites@tardis..ed.ac.uk > <mailto:wwaites@tardis.ed.ac.uk>>> wrote: > >>> > > standards like schema:PostalAddress should > possibly define > >>> relevant > >>> > > operations like equality checking too. > >>> > > >>> > Exactly. > >>> > > >>> > > >> > > >
Received on Wednesday, 5 December 2018 01:58:45 UTC