- From: Thomas Passin <tpassin@tompassin.net>
- Date: Tue, 4 Dec 2018 20:58:16 -0500
- Cc: semantic-web@w3.org
When I think of modeling addresses, after reading some of these posts
and links (and not having had to do this for a living), I would say the
simplest model would be this, which seems pretty close to what Joshua said:
An address
can be represented by one or more representations;
denotes a (physical) location # maybe one or more?
may have one or more textual aliases.
A representation # has one specific syntactical form
may have a grammar specification # e.g. B-N.
A "representation" of an address is one of the many textual forms that
one finds in the wild. You need some non-rdf processing to relate each
type of representation you need to handle to its location. It might
turn out that each type of address could be expressed as a grammar (in
B-N or some other notation) or at least by some syntax rules. If so,
that notation type could be included as a property of the address instance.
For some other time: fuzzy addresses like "on 5th Ave. between 72nd and
73rd streets".
TomP
On 12/4/2018 8:08 PM, Joshua Shinavier wrote:
> Just to add another data point to the "addresses are hard" thread, at
> Uber we have also invested quite some time into standardizing vocabulary
> around addresses. Prior to standardization, there were many dozens of
> address types in use within the company (and still are), most of which
> are of the basic street/city/state/country/zip kind, similar to
> schema.org <http://schema.org>'s PostalAddress. After a great deal of
> discussion, we opted not to support such a format as a standard. Most of
> the reasons for this boil down to items on the page Thomas linked.
> Instead, we distinguish between structured addresses (a bag of
> components which validate against any of a number of black-box address
> schemas) and addresses for display. Google makes a similar distinction
> in its Places API.. Address validation, formatting, normalization, etc.
> are API concerns that go well beyond the vocabulary itself, requiring
> significant background knowledge. I would not be optimistic about
> finding canonical identifiers for addresses, though geocoded lat/lon is
> probably the next best thing.
>
> Josh
>
>
> On Tue, Dec 4, 2018 at 4:16 PM Dave Reynolds <dave.e.reynolds@gmail.com
> <mailto:dave.e.reynolds@gmail.com>> wrote:
>
> Hi Hugh,
>
> On 04/12/2018 22:48, Hugh Glaser wrote:
> > Thanks Dave.
> > Yes, I agree with all the detail.
> >
> > My interpretation is that you are confirming what I was saying -
> that the general case is a nightmare.
>
> On that we are agreed :)
>
> > This is a problem of trying for a standard for the addresses -
> not only is it fiendishly complicated, but no standard will ever
> satisfy all the reasons you might want to identify something, such
> as an address.
> > I agree, which is why I was negative about trying to capture it
> centrally.
> > On the other hand, SW people *are* representing addresses all the
> time, using sufficient specificity for their purposes.
> > And others will be doing the same thing to the same level.
>
> Sure, *representing* addresses is just fine. It's *identifying*
> addresses that's hard.
>
> > And businesses in the UK find that the number/postcode pair is
> pretty much all they need to deliver almost all online purchases.
>
> If you are only dealing with consumers, not other businesses, and
> mostly
> focus on houses in urban areas, and don't care about secondary
> addresses
> (saons - like flat number, unit number, floor etc), and if you only
> care
> about delivery (so there's a human at the other end interpreting the
> address) and if we can agree to differ on the semantics of "almost
> all"
> then that's possibly true.
>
> However, many businesses, even under those constraints, solve it by
> getting a human (the one placing an order) to do the matching. You use
> number/postcode to constrain and order the search on your (very
> expensive) master address list and get the user to pick the right one
> from the result list. *Then* you have an identifier.
>
> > It seems to me that you are concerned with the "global" solution -
>
> No, simply pointing out that matching real world entities is hard for
> domain specific reasons and no amount of RDF/OWL makes much difference
> to that.
>
> Actually, all I was really doing was sharing painfully gathered
> experience that in the UK, postcode + number is far from a nearly
> unique
> key for all addresses. Trust me on this. I've sacrificed a large
> part of
> the last three months to learning this lesson in great detail :(
>
> > I want to worry about a more local problem, and what small steps
> can be taken to help people in common cases, so that SW & LD are
> more useful for developers.
>
> I've lost track of how this thread about thing equality relates to the
> goal of making SW/LD/RDF easier. Which is why I opened with "I don't
> want to get embroiled in the main thread(s)" and just commented on the
> nature of addresses.
>
> [While URIs can be off putting I don't think they are *that* much of a
> problem for developers. Even where they are a barrier it's the
> choice of
> namespace that's the challenge ("you mean we have to host a DNS domain
> and maintain it?"). In my experience most developers are very happy
> with
> the notion that some domains have "natural" composite keys that you can
> use to identify things and some domains you have to do work to create
> some (often human) process to manage your reference identifiers and
> then
> use those as keys. Once you have your keys, one way or another, then
> creating identifiers by combining some sort of namespace with an
> encoding/hash of the composite keys is bread and butter stuff, even
> outside of SW/LD.]
>
> Dave
>
>
> > Or are you saying that because specifying addresses as well as
> you would like is so hard, we shouldn't bother trying to do
> something simpler and useful for many purposes?
>
> > It is about URIs, and they aren't in the noise - they are the
> things that people currently generate for themselves, and get little
> or no help with that generation, or linking up.
> >
> >> On 4 Dec 2018, at 11:24, Dave Reynolds
> <dave.e.reynolds@gmail.com <mailto:dave.e.reynolds@gmail.com>> wrote:
> >>
> >> I don't want to get embroiled in the main thread(s) but, just in
> case anyone is *really* dealing with UK addresses rather than using
> them as rhetorical examples, then ...
> >>
> >> On 03/12/2018 23:37, Anthony Moretti wrote:
> >>> I see your point Hugh, especially in your case because for UK
> addresses consisting of only house number and postcode structural
> equality is sufficient for address equality. Decentralized will work
> very well in that case.
> >>
> >> Sadly that's a long way from being true. UK addresses within a
> postcode my be identified by house name, house name + number,
> business name (with no house name or number at all), any of those
> plus a secondary address etc etc. Even when there's a house "number"
> sometimes its actually a number range not a single number and
> there's considerable ambiguity on how those ranges are expressed and
> what the "definitive" range for a given property really is.
> >>
> >> Identity of UK addresses is simply not something you can express
> in OWL or any logic close to it. You need an address reconciliation
> algorithm to map your address to an maintained identifier set such
> as a UPRN or UDPRN. The reconciliation process will have error rates
> that you will need to manage and recover from, there's no closed,
> guaranteed algorithm.
> >>
> >> Once you have the UPRN or UDPRN or whatever you can create URI's
> or some inverse functional property as you wish. Except that even
> then the official identifier schemes like that aren't perfect and
> have ... oddities ... in them that can still mess you up.
> >>
> >> Generating unique keys for resources based on hashing a few
> properties is all very well in simple cases but, at least in my
> experience, real world problems are nothing like that simple clean.
> You need serious effort to create and maintain identifier schemes
> and to reconcile source data against those schemes. Details like
> URIs or bNodes seem to me rather down in the noise.
> >>
> >> Dave
> >>
> >>> On Mon, Dec 3, 2018 at 3:07 PM Nathan Rixham <nathan@webr3.org
> <mailto:nathan@webr3.org> <mailto:nathan@webr3.org
> <mailto:nathan@webr3.org>>> wrote:
> >>> Hugh, do you mean something like bnode.id <http://bnode.id>
> <http://bnode.id> =
> >>> sha256(serialise(bnode))
> >>> On Mon, 3 Dec 2018, 22:58 Hugh Glaser <hugh@glasers.org
> <mailto:hugh@glasers.org>
> >>> <mailto:hugh@glasers.org <mailto:hugh@glasers.org>> wrote:
> >>> This is not directly about blank nodes, but is a reply to a
> >>> message in the thread.
> >>> I’m certainly agreeing that we should work towards common
> >>> understanding of Thing equality.
> >>> And addresses are a great place to start.
> >>> In order for equality to be defined, I think that means you
> >>> first need an idea of what an unambiguous address looks
> like.
> >>> Having an oracle that defines what an unambiguous Thing
> looks
> >>> like is one organisational structure, and it would be
> great if
> >>> schema.org <http://schema.org> <http://schema.org> could lead
> the way.
> >>> It particularly helps people who just want an off the shelf
> >>> solution, especially if they have no knowledge of the
> Thing domain.
> >>> However I (and perhaps David Booth) am after something more
> >>> anarchic, that can function in a decentralised way (if
> I dare to
> >>> use that term! :-) )
> >>> For example, I might decide that I think that House
> Number and
> >>> PostCode is enough.
> >>> (UK people will know that this is a commonly-used way of
> >>> choosing an address, although it may well not be
> satisfactory
> >>> for some purposes, I’m sure.)
> >>> That may well be sufficient for me to interwork with
> datasets
> >>> from Companies House, the Land Registry and a bunch of
> other
> >>> UK-based organisations, plus many other datasets.
> >>> Having a simple standard way to create keys for such things
> >>> facilitates that, without any standardisation process
> and all
> >>> that entails in weaknesses and strengths of trying to get
> >>> agreement on what an unambiguous address might look
> like on a
> >>> world scale for all purposes.
> >>> Just generating a URI, without needing to make any
> service calls
> >>> (having found where they are and chosen the one you
> want and
> >>> compromised on it, etc.) or anything seems to me a way
> of making
> >>> all the interlinking so much more accessible for us all.
> >>> It is even future proof:- using such a URI means that
> if it is
> >>> about something new (UK postcodes change all the time
> :-(, and
> >>> there are more dead ones than live ones), the oracle
> doesn’t
> >>> tell me anything it didn’t have until I ask again.
> >>> In a key-generating world, my new shiny key will slowly
> align
> >>> with all the other key URIs as they get created.
> >>> So yeah, all strength to anyone who wants to take on
> the central
> >>> roles, but not at the expense of killing the anarchic
> solution,
> >>> please.
> >>> Cheers
> >>> > On 3 Dec 2018, at 22:10, Anthony Moretti
> >>> <anthony.moretti@gmail.com
> <mailto:anthony..moretti@gmail.com>
> <mailto:anthony.moretti@gmail.com <mailto:anthony.moretti@gmail.com>>>
> >>> wrote:
> >>> >
> >>> > Cheers for agreeing William. On the topic of
> incomplete blank
> >>> nodes Henry I'd give them another type, the partial address
> >>> example you give I'd give the type AddressComponent, or
> >>> something to that effect. I could be wrong, but it's
> not a valid
> >>> Address if it's a blank node and no other information
> in the
> >>> graph completes it.
> >>> >
> >>> > Anthony
> >>> >
> >>> > On Mon, Dec 3, 2018 at 1:56 PM William Waites
> >>> <wwaites@tardis.ed.ac.uk
> <mailto:wwaites@tardis.ed.ac.uk> <mailto:wwaites@tardis..ed.ac.uk
> <mailto:wwaites@tardis.ed.ac.uk>>> wrote:
> >>> > > standards like schema:PostalAddress should
> possibly define
> >>> relevant
> >>> > > operations like equality checking too.
> >>> >
> >>> > Exactly.
> >>> >
> >>> >
> >>
> >
>
Received on Wednesday, 5 December 2018 01:58:45 UTC