Re: Addresses have no easy identity was Re: Blank Nodes Re: Toward easier RDF: a proposal

There has been some work done on a general way to express address data on
the web. As a part of the set of EU core vocabularies
<https://ec.europa.eu/isa2/solutions/core-vocabularies_en>, the Location
Core Vocabulary <https://www.w3.org/ns/locn> (locn for short) was developed
and published. It has a related community group, the Locations and
addresses community group <https://www.w3.org/community/locadd/> (locadd
for short). Modelling addresses is hard, so further discussions and
contributions are welcome.

Regards,
Frans

Op wo 5 dec. 2018 om 03:02 schreef Thomas Passin <tpassin@tompassin.net>:

> When I think of modeling addresses, after reading some of these posts
> and links (and not having had to do this for a living), I would say the
> simplest model would be this, which seems pretty close to what Joshua said:
>
> An address
>     can be represented by one or more representations;
>     denotes a (physical) location # maybe one or more?
>     may have one or more textual aliases.
>
> A representation # has one specific syntactical form
>     may have a grammar specification # e.g. B-N.
>
> A "representation" of an address is one of the many textual forms that
> one finds in the wild.  You need some non-rdf processing to relate each
> type of representation you need to handle to its location.  It might
> turn out that each type of address could be expressed as a grammar (in
> B-N or some other notation) or at least by some syntax rules.  If so,
> that notation type could be included as a property of the address instance.
>
> For some other time: fuzzy addresses like "on 5th Ave. between 72nd and
> 73rd streets".
>
> TomP
>
> On 12/4/2018 8:08 PM, Joshua Shinavier wrote:
> > Just to add another data point to the "addresses are hard" thread, at
> > Uber we have also invested quite some time into standardizing vocabulary
> > around addresses. Prior to standardization, there were many dozens of
> > address types in use within the company (and still are), most of which
> > are of the basic street/city/state/country/zip kind, similar to
> > schema.org <http://schema.org>'s PostalAddress. After a great deal of
> > discussion, we opted not to support such a format as a standard. Most of
> > the reasons for this boil down to items on the page Thomas linked.
> > Instead, we distinguish between structured addresses (a bag of
> > components which validate against any of a number of black-box address
> > schemas) and addresses for display. Google makes a similar distinction
> > in its Places API.. Address validation, formatting, normalization, etc.
> > are API concerns that go well beyond the vocabulary itself, requiring
> > significant background knowledge. I would not be optimistic about
> > finding canonical identifiers for addresses, though geocoded lat/lon is
> > probably the next best thing.
> >
> > Josh
> >
> >
> > On Tue, Dec 4, 2018 at 4:16 PM Dave Reynolds <dave.e.reynolds@gmail.com
> > <mailto:dave.e.reynolds@gmail.com>> wrote:
> >
> >     Hi Hugh,
> >
> >     On 04/12/2018 22:48, Hugh Glaser wrote:
> >      > Thanks Dave.
> >      > Yes, I agree with all the detail.
> >      >
> >      > My interpretation is that you are confirming what I was saying -
> >     that the general case is a nightmare.
> >
> >     On that we are agreed :)
> >
> >      > This is a problem of trying for a standard for the addresses -
> >     not only is it fiendishly complicated, but no standard will ever
> >     satisfy all the reasons you might want to identify something, such
> >     as an address.
> >      > I agree, which is why I was negative about trying to capture it
> >     centrally.
> >      > On the other hand, SW people *are* representing addresses all the
> >     time, using sufficient specificity for their purposes.
> >      > And others will be doing the same thing to the same level.
> >
> >     Sure, *representing* addresses is just fine. It's *identifying*
> >     addresses that's hard.
> >
> >      > And businesses in the UK find that the number/postcode pair is
> >     pretty much all they need to deliver almost all online purchases.
> >
> >     If you are only dealing with consumers, not other businesses, and
> >     mostly
> >     focus on houses in urban areas, and don't care about secondary
> >     addresses
> >     (saons - like flat number, unit number, floor etc), and if you only
> >     care
> >     about delivery (so there's a human at the other end interpreting the
> >     address)  and if we can agree to differ on the semantics of "almost
> >     all"
> >     then that's possibly true.
> >
> >     However, many businesses, even under those constraints, solve it by
> >     getting a human (the one placing an order) to do the matching. You
> use
> >     number/postcode to constrain and order the search on your (very
> >     expensive) master address list and get the user to pick the right one
> >     from the result list. *Then* you have an identifier.
> >
> >      > It seems to me that you are concerned with the "global" solution -
> >
> >     No, simply pointing out that matching real world entities is hard for
> >     domain specific reasons and no amount of RDF/OWL makes much
> difference
> >     to that.
> >
> >     Actually, all I was really doing was sharing painfully gathered
> >     experience that in the UK, postcode + number is far from a nearly
> >     unique
> >     key for all addresses. Trust me on this. I've sacrificed a large
> >     part of
> >     the last three months to learning this lesson in great detail :(
> >
> >      > I want to worry about a more local problem, and what small steps
> >     can be taken to help people in common cases, so that SW & LD are
> >     more useful for developers.
> >
> >     I've lost track of how this thread about thing equality relates to
> the
> >     goal of making SW/LD/RDF easier. Which is why I opened with "I don't
> >     want to get embroiled in the main thread(s)" and just commented on
> the
> >     nature of addresses.
> >
> >     [While URIs can be off putting I don't think they are *that* much of
> a
> >     problem for developers. Even where they are a barrier it's the
> >     choice of
> >     namespace that's the challenge ("you mean we have to host a DNS
> domain
> >     and maintain it?"). In my experience most developers are very happy
> >     with
> >     the notion that some domains have "natural" composite keys that you
> can
> >     use to identify things and some domains you have to do work to create
> >     some (often human) process to manage your reference identifiers and
> >     then
> >     use those as keys. Once you have your keys, one way or another, then
> >     creating identifiers by combining some sort of namespace with an
> >     encoding/hash of the composite keys is bread and butter stuff, even
> >     outside of SW/LD.]
> >
> >     Dave
> >
> >
> >      > Or are you saying that because specifying addresses as well as
> >     you would like is so hard, we shouldn't bother trying to do
> >     something simpler and useful for many purposes?
> >
> >      > It is about URIs, and they aren't in the noise - they are the
> >     things that people currently generate for themselves, and get little
> >     or no help with that generation, or linking up.
> >      >
> >      >> On 4 Dec 2018, at 11:24, Dave Reynolds
> >     <dave.e.reynolds@gmail.com <mailto:dave.e.reynolds@gmail.com>>
> wrote:
> >      >>
> >      >> I don't want to get embroiled in the main thread(s) but, just in
> >     case anyone is *really* dealing with UK addresses rather than using
> >     them as rhetorical examples, then ...
> >      >>
> >      >> On 03/12/2018 23:37, Anthony Moretti wrote:
> >      >>> I see your point Hugh, especially in your case because for UK
> >     addresses consisting of only house number and postcode structural
> >     equality is sufficient for address equality. Decentralized will work
> >     very well in that case.
> >      >>
> >      >> Sadly that's a long way from being true. UK addresses within a
> >     postcode my be identified by house name, house name + number,
> >     business name (with no house name or number at all), any of those
> >     plus a secondary address etc etc. Even when there's a house "number"
> >     sometimes its actually a number range not a single number and
> >     there's considerable ambiguity on how those ranges are expressed and
> >     what the "definitive" range for a given property really is.
> >      >>
> >      >> Identity of UK addresses is simply not something you can express
> >     in OWL or any logic close to it. You need an address reconciliation
> >     algorithm to map your address to an maintained identifier set such
> >     as a UPRN or UDPRN. The reconciliation process will have error rates
> >     that you will need to manage and recover from, there's no closed,
> >     guaranteed algorithm.
> >      >>
> >      >> Once you have the UPRN or UDPRN or whatever you can create URI's
> >     or some inverse functional property as you wish. Except that even
> >     then the official identifier schemes like that aren't perfect and
> >     have ... oddities ... in them that can still mess you up.
> >      >>
> >      >> Generating unique keys for resources based on hashing a few
> >     properties is all very well in simple cases but, at least in my
> >     experience, real world problems are nothing like that simple clean.
> >     You need serious effort to create and maintain identifier schemes
> >     and to reconcile source data against those schemes. Details like
> >     URIs or bNodes seem to me rather down in the noise.
> >      >>
> >      >> Dave
> >      >>
> >      >>> On Mon, Dec 3, 2018 at 3:07 PM Nathan Rixham <nathan@webr3.org
> >     <mailto:nathan@webr3.org> <mailto:nathan@webr3.org
> >     <mailto:nathan@webr3.org>>> wrote:
> >      >>>     Hugh, do you mean something like bnode.id <http://bnode.id>
> >     <http://bnode.id> =
> >      >>>     sha256(serialise(bnode))
> >      >>>     On Mon, 3 Dec 2018, 22:58 Hugh Glaser <hugh@glasers.org
> >     <mailto:hugh@glasers.org>
> >      >>>     <mailto:hugh@glasers.org <mailto:hugh@glasers.org>> wrote:
> >      >>>         This is not directly about blank nodes, but is a reply
> to a
> >      >>>         message in the thread.
> >      >>>         I’m certainly agreeing that we should work towards
> common
> >      >>>         understanding of Thing equality.
> >      >>>         And addresses are a great place to start.
> >      >>>         In order for equality to be defined, I think that means
> you
> >      >>>         first need an idea of what an unambiguous address looks
> >     like.
> >      >>>         Having an oracle that defines what an unambiguous Thing
> >     looks
> >      >>>         like is one organisational structure, and it would be
> >     great if
> >      >>> schema.org <http://schema.org> <http://schema.org> could lead
> >     the way.
> >      >>>         It particularly helps people who just want an off the
> shelf
> >      >>>         solution, especially if they have no knowledge of the
> >     Thing domain.
> >      >>>         However I (and perhaps David Booth) am after something
> more
> >      >>>         anarchic, that can function in a decentralised way (if
> >     I dare to
> >      >>>         use that term! :-) )
> >      >>>         For example, I might decide that I think that House
> >     Number and
> >      >>>         PostCode is enough.
> >      >>>         (UK people will know that this is a commonly-used way of
> >      >>>         choosing an address, although it may well not be
> >     satisfactory
> >      >>>         for some purposes, I’m sure.)
> >      >>>         That may well be sufficient for me to interwork with
> >     datasets
> >      >>>         from Companies House, the Land Registry and a bunch of
> >     other
> >      >>>         UK-based organisations, plus many other datasets.
> >      >>>         Having a simple standard way to create keys for such
> things
> >      >>>         facilitates that, without any standardisation process
> >     and all
> >      >>>         that entails in weaknesses and strengths of trying to
> get
> >      >>>         agreement on what an unambiguous address might look
> >     like on a
> >      >>>         world scale for all purposes.
> >      >>>         Just generating a URI, without needing to make any
> >     service calls
> >      >>>         (having found where they are and chosen the one you
> >     want and
> >      >>>         compromised on it, etc.) or anything seems to me a way
> >     of making
> >      >>>         all the interlinking so much more accessible for us all.
> >      >>>         It is even future proof:- using such a URI means that
> >     if it is
> >      >>>         about something new (UK postcodes change all the time
> >     :-(, and
> >      >>>         there are more dead ones than live ones), the oracle
> >     doesn’t
> >      >>>         tell me anything it didn’t have until I ask again.
> >      >>>         In a key-generating world, my new shiny key will slowly
> >     align
> >      >>>         with all the other key URIs as they get created.
> >      >>>         So yeah, all strength to anyone who wants to take on
> >     the central
> >      >>>         roles, but not at the expense of killing the anarchic
> >     solution,
> >      >>>         please.
> >      >>>         Cheers
> >      >>>          > On 3 Dec 2018, at 22:10, Anthony Moretti
> >      >>>         <anthony.moretti@gmail.com
> >     <mailto:anthony..moretti@gmail.com>
> >     <mailto:anthony.moretti@gmail.com <mailto:anthony.moretti@gmail.com
> >>>
> >      >>>         wrote:
> >      >>>          >
> >      >>>          > Cheers for agreeing William. On the topic of
> >     incomplete blank
> >      >>>         nodes Henry I'd give them another type, the partial
> address
> >      >>>         example you give I'd give the type AddressComponent, or
> >      >>>         something to that effect. I could be wrong, but it's
> >     not a valid
> >      >>>         Address if it's a blank node and no other information
> >     in the
> >      >>>         graph completes it.
> >      >>>          >
> >      >>>          > Anthony
> >      >>>          >
> >      >>>          > On Mon, Dec 3, 2018 at 1:56 PM William Waites
> >      >>>         <wwaites@tardis.ed.ac.uk
> >     <mailto:wwaites@tardis.ed.ac.uk> <mailto:wwaites@tardis..ed.ac.uk
> >     <mailto:wwaites@tardis.ed.ac.uk>>> wrote:
> >      >>>          > > standards like schema:PostalAddress should
> >     possibly define
> >      >>>         relevant
> >      >>>          > > operations like equality checking too.
> >      >>>          >
> >      >>>          > Exactly.
> >      >>>          >
> >      >>>          >
> >      >>
> >      >
> >
>
>
>

Received on Thursday, 6 December 2018 14:02:10 UTC