Re: Civic Address for V2 from Charles McCathieNevile on 2009-03-04 (public-geolocation@w3.org from March 2009)

From: Charles McCathieNevile <chaals@opera.com>
Date: Wed, 04 Mar 2009 09:42:52 +0100
To: "Ian Hickson" <ian@hixie.ch>
Cc: "public-geolocation@w3.org" <public-geolocation@w3.org>
Message-ID: <op.up9ehqswwxe0ny@widsith.local>

On Tue, 03 Mar 2009 23:23:47 +0100, Ian Hickson <ian@hixie.ch> wrote:

> On Tue, 3 Mar 2009, Richard Barnes wrote:
>>
>> It's not really the number of fields that's important, right?  If you
>> don't care about the semantics of the fields, then you can just use one
>> fields where everything's smashed together.
...
>> you may as well just use a single field.
>
> That might not be a bad idea, actually. What's the use case for having  
> the information in multiple fields rather than just a multiline field?
...
> Are there use cases that a one-field answer wouldn't solve?

Being able to take two encoded addresses and determine if they are the  
same place (or in the same country). I don't know how important that is -  
depends on whether you will have real-world data with civic address as the  
only useful location, but I would be surprised if that didn't occur.

The semantics also let you determine things that are important to use  
case. A room in a big building like the Pentagon or a shop at Chadstone  
shopping centre is realted to another place in the same building in a way  
that two rooms at Microsoft in Redmond, or two shops in Malvern rd aren't.  
The diversity of addressing conventions (even for *the same studio  
apartment*) means that collapsing the semantics very quickly reduces the  
ability to do intelligent matching of places for any region you don't have  
a huge amount of knowledge, and the ability to force data collection to  
fit carefully defined patterns. Where people are asked detailed questions,  
they tend to give detailed answers, but where the question is "what is  
your address?", you will get much more variability in the data from any  
group of people. Normalising this latter dataset to provide a useful  
"local" application suddenly incurs a substantial requirement for  
processing it.

I doubt that this is of concern to Google (who collect a lot of knowledge  
and can probably afford the processing as a negligible marginal cost) but  
it may be of concern to a small business which wants to produce  
applications using proximity of civic addresses as a metric.

There are cases where that is useful, like parts of India or Australia,  
and other cases where the distance between /adjacent/ civic addresses can  
be measured in terms of travel time by car or train - like different parts  
of Australia.

So this comes back to the use cases and requirements. If being able to  
compare addresses and make useful inferences matters, then it is important  
to split out the semantics, with the level of detail determining how far  
down the semantic split should go. Otherwise, you are right that it is not  
really important.

cheers

Chaals (normally a pure lurker)

-- 
Charles McCathieNevile  Opera Software, Standards Group
     je parle français -- hablo español -- jeg lærer norsk
http://my.opera.com/chaals       Try Opera: http://www.opera.com

Received on Wednesday, 4 March 2009 08:43:38 UTC