- From: Frans Knibbe <frans.knibbe@geodan.nl>
- Date: Thu, 6 Dec 2018 15:01:30 +0100
- To: semantic-web@w3.org
- Message-ID: <CAFVDz43CMVZ1+LAz=Nuz8_xppbNoxuU5AfOJqpUC-Wdrg_kbUg@mail.gmail.com>
There has been some work done on a general way to express address data on the web. As a part of the set of EU core vocabularies <https://ec.europa.eu/isa2/solutions/core-vocabularies_en>, the Location Core Vocabulary <https://www.w3.org/ns/locn> (locn for short) was developed and published. It has a related community group, the Locations and addresses community group <https://www.w3.org/community/locadd/> (locadd for short). Modelling addresses is hard, so further discussions and contributions are welcome. Regards, Frans Op wo 5 dec. 2018 om 03:02 schreef Thomas Passin <tpassin@tompassin.net>: > When I think of modeling addresses, after reading some of these posts > and links (and not having had to do this for a living), I would say the > simplest model would be this, which seems pretty close to what Joshua said: > > An address > can be represented by one or more representations; > denotes a (physical) location # maybe one or more? > may have one or more textual aliases. > > A representation # has one specific syntactical form > may have a grammar specification # e.g. B-N. > > A "representation" of an address is one of the many textual forms that > one finds in the wild. You need some non-rdf processing to relate each > type of representation you need to handle to its location. It might > turn out that each type of address could be expressed as a grammar (in > B-N or some other notation) or at least by some syntax rules. If so, > that notation type could be included as a property of the address instance. > > For some other time: fuzzy addresses like "on 5th Ave. between 72nd and > 73rd streets". > > TomP > > On 12/4/2018 8:08 PM, Joshua Shinavier wrote: > > Just to add another data point to the "addresses are hard" thread, at > > Uber we have also invested quite some time into standardizing vocabulary > > around addresses. Prior to standardization, there were many dozens of > > address types in use within the company (and still are), most of which > > are of the basic street/city/state/country/zip kind, similar to > > schema.org <http://schema.org>'s PostalAddress. After a great deal of > > discussion, we opted not to support such a format as a standard. Most of > > the reasons for this boil down to items on the page Thomas linked. > > Instead, we distinguish between structured addresses (a bag of > > components which validate against any of a number of black-box address > > schemas) and addresses for display. Google makes a similar distinction > > in its Places API.. Address validation, formatting, normalization, etc. > > are API concerns that go well beyond the vocabulary itself, requiring > > significant background knowledge. I would not be optimistic about > > finding canonical identifiers for addresses, though geocoded lat/lon is > > probably the next best thing. > > > > Josh > > > > > > On Tue, Dec 4, 2018 at 4:16 PM Dave Reynolds <dave.e.reynolds@gmail.com > > <mailto:dave.e.reynolds@gmail.com>> wrote: > > > > Hi Hugh, > > > > On 04/12/2018 22:48, Hugh Glaser wrote: > > > Thanks Dave. > > > Yes, I agree with all the detail. > > > > > > My interpretation is that you are confirming what I was saying - > > that the general case is a nightmare. > > > > On that we are agreed :) > > > > > This is a problem of trying for a standard for the addresses - > > not only is it fiendishly complicated, but no standard will ever > > satisfy all the reasons you might want to identify something, such > > as an address. > > > I agree, which is why I was negative about trying to capture it > > centrally. > > > On the other hand, SW people *are* representing addresses all the > > time, using sufficient specificity for their purposes. > > > And others will be doing the same thing to the same level. > > > > Sure, *representing* addresses is just fine. It's *identifying* > > addresses that's hard. > > > > > And businesses in the UK find that the number/postcode pair is > > pretty much all they need to deliver almost all online purchases. > > > > If you are only dealing with consumers, not other businesses, and > > mostly > > focus on houses in urban areas, and don't care about secondary > > addresses > > (saons - like flat number, unit number, floor etc), and if you only > > care > > about delivery (so there's a human at the other end interpreting the > > address) and if we can agree to differ on the semantics of "almost > > all" > > then that's possibly true. > > > > However, many businesses, even under those constraints, solve it by > > getting a human (the one placing an order) to do the matching. You > use > > number/postcode to constrain and order the search on your (very > > expensive) master address list and get the user to pick the right one > > from the result list. *Then* you have an identifier. > > > > > It seems to me that you are concerned with the "global" solution - > > > > No, simply pointing out that matching real world entities is hard for > > domain specific reasons and no amount of RDF/OWL makes much > difference > > to that. > > > > Actually, all I was really doing was sharing painfully gathered > > experience that in the UK, postcode + number is far from a nearly > > unique > > key for all addresses. Trust me on this. I've sacrificed a large > > part of > > the last three months to learning this lesson in great detail :( > > > > > I want to worry about a more local problem, and what small steps > > can be taken to help people in common cases, so that SW & LD are > > more useful for developers. > > > > I've lost track of how this thread about thing equality relates to > the > > goal of making SW/LD/RDF easier. Which is why I opened with "I don't > > want to get embroiled in the main thread(s)" and just commented on > the > > nature of addresses. > > > > [While URIs can be off putting I don't think they are *that* much of > a > > problem for developers. Even where they are a barrier it's the > > choice of > > namespace that's the challenge ("you mean we have to host a DNS > domain > > and maintain it?"). In my experience most developers are very happy > > with > > the notion that some domains have "natural" composite keys that you > can > > use to identify things and some domains you have to do work to create > > some (often human) process to manage your reference identifiers and > > then > > use those as keys. Once you have your keys, one way or another, then > > creating identifiers by combining some sort of namespace with an > > encoding/hash of the composite keys is bread and butter stuff, even > > outside of SW/LD.] > > > > Dave > > > > > > > Or are you saying that because specifying addresses as well as > > you would like is so hard, we shouldn't bother trying to do > > something simpler and useful for many purposes? > > > > > It is about URIs, and they aren't in the noise - they are the > > things that people currently generate for themselves, and get little > > or no help with that generation, or linking up. > > > > > >> On 4 Dec 2018, at 11:24, Dave Reynolds > > <dave.e.reynolds@gmail.com <mailto:dave.e.reynolds@gmail.com>> > wrote: > > >> > > >> I don't want to get embroiled in the main thread(s) but, just in > > case anyone is *really* dealing with UK addresses rather than using > > them as rhetorical examples, then ... > > >> > > >> On 03/12/2018 23:37, Anthony Moretti wrote: > > >>> I see your point Hugh, especially in your case because for UK > > addresses consisting of only house number and postcode structural > > equality is sufficient for address equality. Decentralized will work > > very well in that case. > > >> > > >> Sadly that's a long way from being true. UK addresses within a > > postcode my be identified by house name, house name + number, > > business name (with no house name or number at all), any of those > > plus a secondary address etc etc. Even when there's a house "number" > > sometimes its actually a number range not a single number and > > there's considerable ambiguity on how those ranges are expressed and > > what the "definitive" range for a given property really is. > > >> > > >> Identity of UK addresses is simply not something you can express > > in OWL or any logic close to it. You need an address reconciliation > > algorithm to map your address to an maintained identifier set such > > as a UPRN or UDPRN. The reconciliation process will have error rates > > that you will need to manage and recover from, there's no closed, > > guaranteed algorithm. > > >> > > >> Once you have the UPRN or UDPRN or whatever you can create URI's > > or some inverse functional property as you wish. Except that even > > then the official identifier schemes like that aren't perfect and > > have ... oddities ... in them that can still mess you up. > > >> > > >> Generating unique keys for resources based on hashing a few > > properties is all very well in simple cases but, at least in my > > experience, real world problems are nothing like that simple clean. > > You need serious effort to create and maintain identifier schemes > > and to reconcile source data against those schemes. Details like > > URIs or bNodes seem to me rather down in the noise. > > >> > > >> Dave > > >> > > >>> On Mon, Dec 3, 2018 at 3:07 PM Nathan Rixham <nathan@webr3.org > > <mailto:nathan@webr3.org> <mailto:nathan@webr3.org > > <mailto:nathan@webr3.org>>> wrote: > > >>> Hugh, do you mean something like bnode.id <http://bnode.id> > > <http://bnode.id> = > > >>> sha256(serialise(bnode)) > > >>> On Mon, 3 Dec 2018, 22:58 Hugh Glaser <hugh@glasers.org > > <mailto:hugh@glasers.org> > > >>> <mailto:hugh@glasers.org <mailto:hugh@glasers.org>> wrote: > > >>> This is not directly about blank nodes, but is a reply > to a > > >>> message in the thread. > > >>> I’m certainly agreeing that we should work towards > common > > >>> understanding of Thing equality. > > >>> And addresses are a great place to start. > > >>> In order for equality to be defined, I think that means > you > > >>> first need an idea of what an unambiguous address looks > > like. > > >>> Having an oracle that defines what an unambiguous Thing > > looks > > >>> like is one organisational structure, and it would be > > great if > > >>> schema.org <http://schema.org> <http://schema.org> could lead > > the way. > > >>> It particularly helps people who just want an off the > shelf > > >>> solution, especially if they have no knowledge of the > > Thing domain. > > >>> However I (and perhaps David Booth) am after something > more > > >>> anarchic, that can function in a decentralised way (if > > I dare to > > >>> use that term! :-) ) > > >>> For example, I might decide that I think that House > > Number and > > >>> PostCode is enough. > > >>> (UK people will know that this is a commonly-used way of > > >>> choosing an address, although it may well not be > > satisfactory > > >>> for some purposes, I’m sure.) > > >>> That may well be sufficient for me to interwork with > > datasets > > >>> from Companies House, the Land Registry and a bunch of > > other > > >>> UK-based organisations, plus many other datasets. > > >>> Having a simple standard way to create keys for such > things > > >>> facilitates that, without any standardisation process > > and all > > >>> that entails in weaknesses and strengths of trying to > get > > >>> agreement on what an unambiguous address might look > > like on a > > >>> world scale for all purposes. > > >>> Just generating a URI, without needing to make any > > service calls > > >>> (having found where they are and chosen the one you > > want and > > >>> compromised on it, etc.) or anything seems to me a way > > of making > > >>> all the interlinking so much more accessible for us all. > > >>> It is even future proof:- using such a URI means that > > if it is > > >>> about something new (UK postcodes change all the time > > :-(, and > > >>> there are more dead ones than live ones), the oracle > > doesn’t > > >>> tell me anything it didn’t have until I ask again. > > >>> In a key-generating world, my new shiny key will slowly > > align > > >>> with all the other key URIs as they get created. > > >>> So yeah, all strength to anyone who wants to take on > > the central > > >>> roles, but not at the expense of killing the anarchic > > solution, > > >>> please. > > >>> Cheers > > >>> > On 3 Dec 2018, at 22:10, Anthony Moretti > > >>> <anthony.moretti@gmail.com > > <mailto:anthony..moretti@gmail.com> > > <mailto:anthony.moretti@gmail.com <mailto:anthony.moretti@gmail.com > >>> > > >>> wrote: > > >>> > > > >>> > Cheers for agreeing William. On the topic of > > incomplete blank > > >>> nodes Henry I'd give them another type, the partial > address > > >>> example you give I'd give the type AddressComponent, or > > >>> something to that effect. I could be wrong, but it's > > not a valid > > >>> Address if it's a blank node and no other information > > in the > > >>> graph completes it. > > >>> > > > >>> > Anthony > > >>> > > > >>> > On Mon, Dec 3, 2018 at 1:56 PM William Waites > > >>> <wwaites@tardis.ed.ac.uk > > <mailto:wwaites@tardis.ed.ac.uk> <mailto:wwaites@tardis..ed.ac.uk > > <mailto:wwaites@tardis.ed.ac.uk>>> wrote: > > >>> > > standards like schema:PostalAddress should > > possibly define > > >>> relevant > > >>> > > operations like equality checking too. > > >>> > > > >>> > Exactly. > > >>> > > > >>> > > > >> > > > > > > > >
Received on Thursday, 6 December 2018 14:02:10 UTC