- From: Dave Reynolds <dave.e.reynolds@gmail.com>
- Date: Wed, 5 Dec 2018 00:13:18 +0000
- To: Hugh Glaser <hugh@glasers.org>
- Cc: semantic-web@w3.org
Hi Hugh, On 04/12/2018 22:48, Hugh Glaser wrote: > Thanks Dave. > Yes, I agree with all the detail. > > My interpretation is that you are confirming what I was saying - that the general case is a nightmare. On that we are agreed :) > This is a problem of trying for a standard for the addresses - not only is it fiendishly complicated, but no standard will ever satisfy all the reasons you might want to identify something, such as an address. > I agree, which is why I was negative about trying to capture it centrally. > On the other hand, SW people *are* representing addresses all the time, using sufficient specificity for their purposes. > And others will be doing the same thing to the same level. Sure, *representing* addresses is just fine. It's *identifying* addresses that's hard. > And businesses in the UK find that the number/postcode pair is pretty much all they need to deliver almost all online purchases. If you are only dealing with consumers, not other businesses, and mostly focus on houses in urban areas, and don't care about secondary addresses (saons - like flat number, unit number, floor etc), and if you only care about delivery (so there's a human at the other end interpreting the address) and if we can agree to differ on the semantics of "almost all" then that's possibly true. However, many businesses, even under those constraints, solve it by getting a human (the one placing an order) to do the matching. You use number/postcode to constrain and order the search on your (very expensive) master address list and get the user to pick the right one from the result list. *Then* you have an identifier. > It seems to me that you are concerned with the "global" solution - No, simply pointing out that matching real world entities is hard for domain specific reasons and no amount of RDF/OWL makes much difference to that. Actually, all I was really doing was sharing painfully gathered experience that in the UK, postcode + number is far from a nearly unique key for all addresses. Trust me on this. I've sacrificed a large part of the last three months to learning this lesson in great detail :( > I want to worry about a more local problem, and what small steps can be taken to help people in common cases, so that SW & LD are more useful for developers. I've lost track of how this thread about thing equality relates to the goal of making SW/LD/RDF easier. Which is why I opened with "I don't want to get embroiled in the main thread(s)" and just commented on the nature of addresses. [While URIs can be off putting I don't think they are *that* much of a problem for developers. Even where they are a barrier it's the choice of namespace that's the challenge ("you mean we have to host a DNS domain and maintain it?"). In my experience most developers are very happy with the notion that some domains have "natural" composite keys that you can use to identify things and some domains you have to do work to create some (often human) process to manage your reference identifiers and then use those as keys. Once you have your keys, one way or another, then creating identifiers by combining some sort of namespace with an encoding/hash of the composite keys is bread and butter stuff, even outside of SW/LD.] Dave > Or are you saying that because specifying addresses as well as you would like is so hard, we shouldn't bother trying to do something simpler and useful for many purposes? > It is about URIs, and they aren't in the noise - they are the things that people currently generate for themselves, and get little or no help with that generation, or linking up. > >> On 4 Dec 2018, at 11:24, Dave Reynolds <dave.e.reynolds@gmail.com> wrote: >> >> I don't want to get embroiled in the main thread(s) but, just in case anyone is *really* dealing with UK addresses rather than using them as rhetorical examples, then ... >> >> On 03/12/2018 23:37, Anthony Moretti wrote: >>> I see your point Hugh, especially in your case because for UK addresses consisting of only house number and postcode structural equality is sufficient for address equality. Decentralized will work very well in that case. >> >> Sadly that's a long way from being true. UK addresses within a postcode my be identified by house name, house name + number, business name (with no house name or number at all), any of those plus a secondary address etc etc. Even when there's a house "number" sometimes its actually a number range not a single number and there's considerable ambiguity on how those ranges are expressed and what the "definitive" range for a given property really is. >> >> Identity of UK addresses is simply not something you can express in OWL or any logic close to it. You need an address reconciliation algorithm to map your address to an maintained identifier set such as a UPRN or UDPRN. The reconciliation process will have error rates that you will need to manage and recover from, there's no closed, guaranteed algorithm. >> >> Once you have the UPRN or UDPRN or whatever you can create URI's or some inverse functional property as you wish. Except that even then the official identifier schemes like that aren't perfect and have ... oddities ... in them that can still mess you up. >> >> Generating unique keys for resources based on hashing a few properties is all very well in simple cases but, at least in my experience, real world problems are nothing like that simple clean. You need serious effort to create and maintain identifier schemes and to reconcile source data against those schemes. Details like URIs or bNodes seem to me rather down in the noise. >> >> Dave >> >>> On Mon, Dec 3, 2018 at 3:07 PM Nathan Rixham <nathan@webr3.org <mailto:nathan@webr3.org>> wrote: >>> Hugh, do you mean something like bnode.id <http://bnode.id> = >>> sha256(serialise(bnode)) >>> On Mon, 3 Dec 2018, 22:58 Hugh Glaser <hugh@glasers.org >>> <mailto:hugh@glasers.org> wrote: >>> This is not directly about blank nodes, but is a reply to a >>> message in the thread. >>> I’m certainly agreeing that we should work towards common >>> understanding of Thing equality. >>> And addresses are a great place to start. >>> In order for equality to be defined, I think that means you >>> first need an idea of what an unambiguous address looks like. >>> Having an oracle that defines what an unambiguous Thing looks >>> like is one organisational structure, and it would be great if >>> schema.org <http://schema.org> could lead the way. >>> It particularly helps people who just want an off the shelf >>> solution, especially if they have no knowledge of the Thing domain. >>> However I (and perhaps David Booth) am after something more >>> anarchic, that can function in a decentralised way (if I dare to >>> use that term! :-) ) >>> For example, I might decide that I think that House Number and >>> PostCode is enough. >>> (UK people will know that this is a commonly-used way of >>> choosing an address, although it may well not be satisfactory >>> for some purposes, I’m sure.) >>> That may well be sufficient for me to interwork with datasets >>> from Companies House, the Land Registry and a bunch of other >>> UK-based organisations, plus many other datasets. >>> Having a simple standard way to create keys for such things >>> facilitates that, without any standardisation process and all >>> that entails in weaknesses and strengths of trying to get >>> agreement on what an unambiguous address might look like on a >>> world scale for all purposes. >>> Just generating a URI, without needing to make any service calls >>> (having found where they are and chosen the one you want and >>> compromised on it, etc.) or anything seems to me a way of making >>> all the interlinking so much more accessible for us all. >>> It is even future proof:- using such a URI means that if it is >>> about something new (UK postcodes change all the time :-(, and >>> there are more dead ones than live ones), the oracle doesn’t >>> tell me anything it didn’t have until I ask again. >>> In a key-generating world, my new shiny key will slowly align >>> with all the other key URIs as they get created. >>> So yeah, all strength to anyone who wants to take on the central >>> roles, but not at the expense of killing the anarchic solution, >>> please. >>> Cheers >>> > On 3 Dec 2018, at 22:10, Anthony Moretti >>> <anthony.moretti@gmail.com <mailto:anthony.moretti@gmail.com>> >>> wrote: >>> > >>> > Cheers for agreeing William. On the topic of incomplete blank >>> nodes Henry I'd give them another type, the partial address >>> example you give I'd give the type AddressComponent, or >>> something to that effect. I could be wrong, but it's not a valid >>> Address if it's a blank node and no other information in the >>> graph completes it. >>> > >>> > Anthony >>> > >>> > On Mon, Dec 3, 2018 at 1:56 PM William Waites >>> <wwaites@tardis.ed.ac.uk <mailto:wwaites@tardis.ed.ac.uk>> wrote: >>> > > standards like schema:PostalAddress should possibly define >>> relevant >>> > > operations like equality checking too. >>> > >>> > Exactly. >>> > >>> > >> >
Received on Wednesday, 5 December 2018 00:13:44 UTC