Re: Addresses have no easy identity was Re: Blank Nodes Re: Toward easier RDF: a proposal from Thomas Passin on 2018-12-05 (semantic-web@w3.org from December 2018)

From: Thomas Passin <tpassin@tompassin.net>
Date: Tue, 4 Dec 2018 20:58:16 -0500
Cc: semantic-web@w3.org
Message-ID: <aee837f4-dc96-9477-1ccd-83380154012e@tompassin.net>
When I think of modeling addresses, after reading some of these posts 
and links (and not having had to do this for a living), I would say the 
simplest model would be this, which seems pretty close to what Joshua said:

An address
    can be represented by one or more representations;
    denotes a (physical) location # maybe one or more?
    may have one or more textual aliases.

A representation # has one specific syntactical form
    may have a grammar specification # e.g. B-N.

A "representation" of an address is one of the many textual forms that 
one finds in the wild.  You need some non-rdf processing to relate each 
type of representation you need to handle to its location.  It might 
turn out that each type of address could be expressed as a grammar (in 
B-N or some other notation) or at least by some syntax rules.  If so, 
that notation type could be included as a property of the address instance.

For some other time: fuzzy addresses like "on 5th Ave. between 72nd and 
73rd streets".

TomP

On 12/4/2018 8:08 PM, Joshua Shinavier wrote:
> Just to add another data point to the "addresses are hard" thread, at 
> Uber we have also invested quite some time into standardizing vocabulary 
> around addresses. Prior to standardization, there were many dozens of 
> address types in use within the company (and still are), most of which 
> are of the basic street/city/state/country/zip kind, similar to 
> schema.org <http://schema.org>'s PostalAddress. After a great deal of 
> discussion, we opted not to support such a format as a standard. Most of 
> the reasons for this boil down to items on the page Thomas linked. 
> Instead, we distinguish between structured addresses (a bag of 
> components which validate against any of a number of black-box address 
> schemas) and addresses for display. Google makes a similar distinction 
> in its Places API.. Address validation, formatting, normalization, etc. 
> are API concerns that go well beyond the vocabulary itself, requiring 
> significant background knowledge. I would not be optimistic about 
> finding canonical identifiers for addresses, though geocoded lat/lon is 
> probably the next best thing.
> 
> Josh
> 
> 
> On Tue, Dec 4, 2018 at 4:16 PM Dave Reynolds <dave.e.reynolds@gmail.com 
> <mailto:dave.e.reynolds@gmail.com>> wrote:
> 
>     Hi Hugh,
> 
>     On 04/12/2018 22:48, Hugh Glaser wrote:
>      > Thanks Dave.
>      > Yes, I agree with all the detail.
>      >
>      > My interpretation is that you are confirming what I was saying -
>     that the general case is a nightmare.
> 
>     On that we are agreed :)
> 
>      > This is a problem of trying for a standard for the addresses -
>     not only is it fiendishly complicated, but no standard will ever
>     satisfy all the reasons you might want to identify something, such
>     as an address.
>      > I agree, which is why I was negative about trying to capture it
>     centrally.
>      > On the other hand, SW people *are* representing addresses all the
>     time, using sufficient specificity for their purposes.
>      > And others will be doing the same thing to the same level.
> 
>     Sure, *representing* addresses is just fine. It's *identifying*
>     addresses that's hard.
> 
>      > And businesses in the UK find that the number/postcode pair is
>     pretty much all they need to deliver almost all online purchases.
> 
>     If you are only dealing with consumers, not other businesses, and
>     mostly
>     focus on houses in urban areas, and don't care about secondary
>     addresses
>     (saons - like flat number, unit number, floor etc), and if you only
>     care
>     about delivery (so there's a human at the other end interpreting the
>     address)  and if we can agree to differ on the semantics of "almost
>     all"
>     then that's possibly true.
> 
>     However, many businesses, even under those constraints, solve it by
>     getting a human (the one placing an order) to do the matching. You use
>     number/postcode to constrain and order the search on your (very
>     expensive) master address list and get the user to pick the right one
>     from the result list. *Then* you have an identifier.
> 
>      > It seems to me that you are concerned with the "global" solution -
> 
>     No, simply pointing out that matching real world entities is hard for
>     domain specific reasons and no amount of RDF/OWL makes much difference
>     to that.
> 
>     Actually, all I was really doing was sharing painfully gathered
>     experience that in the UK, postcode + number is far from a nearly
>     unique
>     key for all addresses. Trust me on this. I've sacrificed a large
>     part of
>     the last three months to learning this lesson in great detail :(
> 
>      > I want to worry about a more local problem, and what small steps
>     can be taken to help people in common cases, so that SW & LD are
>     more useful for developers.
> 
>     I've lost track of how this thread about thing equality relates to the
>     goal of making SW/LD/RDF easier. Which is why I opened with "I don't
>     want to get embroiled in the main thread(s)" and just commented on the
>     nature of addresses.
> 
>     [While URIs can be off putting I don't think they are *that* much of a
>     problem for developers. Even where they are a barrier it's the
>     choice of
>     namespace that's the challenge ("you mean we have to host a DNS domain
>     and maintain it?"). In my experience most developers are very happy
>     with
>     the notion that some domains have "natural" composite keys that you can
>     use to identify things and some domains you have to do work to create
>     some (often human) process to manage your reference identifiers and
>     then
>     use those as keys. Once you have your keys, one way or another, then
>     creating identifiers by combining some sort of namespace with an
>     encoding/hash of the composite keys is bread and butter stuff, even
>     outside of SW/LD.]
> 
>     Dave
> 
> 
>      > Or are you saying that because specifying addresses as well as
>     you would like is so hard, we shouldn't bother trying to do
>     something simpler and useful for many purposes?
> 
>      > It is about URIs, and they aren't in the noise - they are the
>     things that people currently generate for themselves, and get little
>     or no help with that generation, or linking up.
>      >
>      >> On 4 Dec 2018, at 11:24, Dave Reynolds
>     <dave.e.reynolds@gmail.com <mailto:dave.e.reynolds@gmail.com>> wrote:
>      >>
>      >> I don't want to get embroiled in the main thread(s) but, just in
>     case anyone is *really* dealing with UK addresses rather than using
>     them as rhetorical examples, then ...
>      >>
>      >> On 03/12/2018 23:37, Anthony Moretti wrote:
>      >>> I see your point Hugh, especially in your case because for UK
>     addresses consisting of only house number and postcode structural
>     equality is sufficient for address equality. Decentralized will work
>     very well in that case.
>      >>
>      >> Sadly that's a long way from being true. UK addresses within a
>     postcode my be identified by house name, house name + number,
>     business name (with no house name or number at all), any of those
>     plus a secondary address etc etc. Even when there's a house "number"
>     sometimes its actually a number range not a single number and
>     there's considerable ambiguity on how those ranges are expressed and
>     what the "definitive" range for a given property really is.
>      >>
>      >> Identity of UK addresses is simply not something you can express
>     in OWL or any logic close to it. You need an address reconciliation
>     algorithm to map your address to an maintained identifier set such
>     as a UPRN or UDPRN. The reconciliation process will have error rates
>     that you will need to manage and recover from, there's no closed,
>     guaranteed algorithm.
>      >>
>      >> Once you have the UPRN or UDPRN or whatever you can create URI's
>     or some inverse functional property as you wish. Except that even
>     then the official identifier schemes like that aren't perfect and
>     have ... oddities ... in them that can still mess you up.
>      >>
>      >> Generating unique keys for resources based on hashing a few
>     properties is all very well in simple cases but, at least in my
>     experience, real world problems are nothing like that simple clean.
>     You need serious effort to create and maintain identifier schemes
>     and to reconcile source data against those schemes. Details like
>     URIs or bNodes seem to me rather down in the noise.
>      >>
>      >> Dave
>      >>
>      >>> On Mon, Dec 3, 2018 at 3:07 PM Nathan Rixham <nathan@webr3.org
>     <mailto:nathan@webr3.org> <mailto:nathan@webr3.org
>     <mailto:nathan@webr3.org>>> wrote:
>      >>>     Hugh, do you mean something like bnode.id <http://bnode.id>
>     <http://bnode.id> =
>      >>>     sha256(serialise(bnode))
>      >>>     On Mon, 3 Dec 2018, 22:58 Hugh Glaser <hugh@glasers.org
>     <mailto:hugh@glasers.org>
>      >>>     <mailto:hugh@glasers.org <mailto:hugh@glasers.org>> wrote:
>      >>>         This is not directly about blank nodes, but is a reply to a
>      >>>         message in the thread.
>      >>>         I’m certainly agreeing that we should work towards common
>      >>>         understanding of Thing equality.
>      >>>         And addresses are a great place to start.
>      >>>         In order for equality to be defined, I think that means you
>      >>>         first need an idea of what an unambiguous address looks
>     like.
>      >>>         Having an oracle that defines what an unambiguous Thing
>     looks
>      >>>         like is one organisational structure, and it would be
>     great if
>      >>> schema.org <http://schema.org> <http://schema.org> could lead
>     the way.
>      >>>         It particularly helps people who just want an off the shelf
>      >>>         solution, especially if they have no knowledge of the
>     Thing domain.
>      >>>         However I (and perhaps David Booth) am after something more
>      >>>         anarchic, that can function in a decentralised way (if
>     I dare to
>      >>>         use that term! :-) )
>      >>>         For example, I might decide that I think that House
>     Number and
>      >>>         PostCode is enough.
>      >>>         (UK people will know that this is a commonly-used way of
>      >>>         choosing an address, although it may well not be
>     satisfactory
>      >>>         for some purposes, I’m sure.)
>      >>>         That may well be sufficient for me to interwork with
>     datasets
>      >>>         from Companies House, the Land Registry and a bunch of
>     other
>      >>>         UK-based organisations, plus many other datasets.
>      >>>         Having a simple standard way to create keys for such things
>      >>>         facilitates that, without any standardisation process
>     and all
>      >>>         that entails in weaknesses and strengths of trying to get
>      >>>         agreement on what an unambiguous address might look
>     like on a
>      >>>         world scale for all purposes.
>      >>>         Just generating a URI, without needing to make any
>     service calls
>      >>>         (having found where they are and chosen the one you
>     want and
>      >>>         compromised on it, etc.) or anything seems to me a way
>     of making
>      >>>         all the interlinking so much more accessible for us all.
>      >>>         It is even future proof:- using such a URI means that
>     if it is
>      >>>         about something new (UK postcodes change all the time
>     :-(, and
>      >>>         there are more dead ones than live ones), the oracle
>     doesn’t
>      >>>         tell me anything it didn’t have until I ask again.
>      >>>         In a key-generating world, my new shiny key will slowly
>     align
>      >>>         with all the other key URIs as they get created.
>      >>>         So yeah, all strength to anyone who wants to take on
>     the central
>      >>>         roles, but not at the expense of killing the anarchic
>     solution,
>      >>>         please.
>      >>>         Cheers
>      >>>          > On 3 Dec 2018, at 22:10, Anthony Moretti
>      >>>         <anthony.moretti@gmail.com
>     <mailto:anthony..moretti@gmail.com>
>     <mailto:anthony.moretti@gmail.com <mailto:anthony.moretti@gmail.com>>>
>      >>>         wrote:
>      >>>          >
>      >>>          > Cheers for agreeing William. On the topic of
>     incomplete blank
>      >>>         nodes Henry I'd give them another type, the partial address
>      >>>         example you give I'd give the type AddressComponent, or
>      >>>         something to that effect. I could be wrong, but it's
>     not a valid
>      >>>         Address if it's a blank node and no other information
>     in the
>      >>>         graph completes it.
>      >>>          >
>      >>>          > Anthony
>      >>>          >
>      >>>          > On Mon, Dec 3, 2018 at 1:56 PM William Waites
>      >>>         <wwaites@tardis.ed.ac.uk
>     <mailto:wwaites@tardis.ed.ac.uk> <mailto:wwaites@tardis..ed.ac.uk
>     <mailto:wwaites@tardis.ed.ac.uk>>> wrote:
>      >>>          > > standards like schema:PostalAddress should
>     possibly define
>      >>>         relevant
>      >>>          > > operations like equality checking too.
>      >>>          >
>      >>>          > Exactly.
>      >>>          >
>      >>>          >
>      >>
>      >
>
Received on Wednesday, 5 December 2018 01:58:45 UTC