Re: [whatwg] Supporting more address levels in autocomplete

On Mon, Mar 3, 2014 at 2:18 PM, Ian Hickson <ian@hixie.ch> wrote:

> On Mon, 3 Mar 2014, Evan Stade wrote:
> >
> > I'm still confused. The site author has entered bad markup. Is your
> > concern that site authors will be unable to write good markup?
>
> Some will write good markup, I'm sure.
>
> Our job as language designers is to maximise the number of authors doing a
> good job, and minimise the number of authors who make unintentional
> mistakes.
>
>
> > > There's no point us allowing address-level881. It will never be
> > > useful.
> >
> > Is there a point in disallowing it?
>
> Yeah. It simplifies the language, means there's less to test so it
> simplifies testing, it simplifies authoring, it reduces tutorial
> complexity, it makes answering questions like "how many should I include"
> easy to answer, and so on.
>
>
> > Ultimately it doesn't matter too much, but I would think it's a goal to
> > avoid spec churn.
>
> Adding features isn't such a big deal, especially when they're in response
> to changing political conditions.
>
>
> > If we're going to set some limit, let's say 4.
>
> Ok.
>
>
> > > Well if for some reason you want to exclude non-US customers, sure.
> > > But suppose you do want to include all customers, but you're a
> > > mom-and-pop store who is just going to put what you put in the form
> > > onto the envelope, and who doesn't know the intricacies of each
> > > country's postal standards.
> > >
> > > How many fields should you list?
> >
> > In this case, address-level-n doesn't help you. In order to be able to
> > write an address onto an envelope, you want an address blob, not
> > tokenized bits. This address blob was proposed further up the thread,
> > and I think it's a good idea, but distinct from the current topic, which
> > is how to get tokenized bits for places like China.
> >
> > Of course, tokenized bits can be used to create an address blob, but it
> > requires some sophistication to do so.
>
> If you take the fields from the spec today and those proposed in this
> thread, and concatenate them one-to-a-line in the following order:
>
>    "organization"
>    "address-line1"
>    "address-line2"
>    "address-line3"
>    "address-level4"
>    "address-level3"
>    "address-level2"
>    "address-level1"
>    "country-name"
>    "postal-code"
>
> ...the mail is going to get where you want it to get, right?


> So for the mom-and-pop store, this seems like it would be sufficient.
>
> Even if they render it as:
>
>    "organization"
>    "address-line1"
>    "address-line2"
>    "address-line3"
>    "address-level4", "address-level3" "postal-code"
>    "address-level2" "address-level1"
>    "country-name"
>
> ...so that it's optimised for the US, it would still work everywhere,
> you'd just have some slightly annoyed postal staff in some countries.
>

Or some automatic mail-sorting machines which reject your mail completely.
I think you have more confidence in the post office than I do. But again,
this is something to be addressed separately with a "display-address" token
(or whatever you want to call it).


>
> So I don't think it's right to say that address-level* doesn't help you
> in the mom-and-pop store case. It does.
>

In some sense, you could put all the components of the address in a
completely random order and if you're lucky, they'll be unambiguous enough
for some human to figure it out. I don't think we should recommend people
do that though.


>
>
> > I don't think you can just write a stack of inputs that accepts input
> > for any country. The country determines:
> >
> > a) what fields make sense
> > b) what fields are required
> > c) the order of fields
> >
> > You could ignore (a) and settle for a crappy UI that shows all fields
> > that make sense anywhere in the world, but you'd still be left with
> > solving (b) and (c).
>
> (b) is an easy-to-solve problem: you don't make any of them required, and
> if the customer entered insufficient fields, they're not getting their
> package, and will have to be contacted out-of-band.
>

I don't think the additional load that would place on customer service, the
number of missing packages, etc., would be considered an "easy" problem or
even an improvement over whatever they currently have in place.


>
> Can you elaborate on (c)?
>

US looks like:

Recipient
Organization
Street address
City, State ZIP

Japan looks like:

〒 ZIP
State City
Street address
Organization
Recipient

Using the US format for a Japanese address would be fantastically bad.
Users in Japan would probably just enter their address in the way they
think of as "forwards" but your website thinks of as "backwards" meaning
you get the exact problem you pointed out earlier.


>
> If this is something that's required to make user of these autofill
> fields, then we should explain to authors what they need to do.
>
>
> > > Alternatively, if "region" is always the last address-level* value,
> then
> > > we could just do the mapping backwards:
> > >
> > >    address-line1
> > >    address-line2
> > >    address-line3
> > >    address-levelN
> > >    ...
> > >    address-level3
> > >    address-level2 = locality
> > >    address-level1 = region
> >
> > This isn't backwards, this is what we're proposing.
>
> Then why would UAE be missing address-level1? I'm confused.
>
> The reason I say this is backwards is that it is the reverse of the
> "address-line*" fields. This could be confusing.
>

It's only reversed if you're talking about US display order. It's not
reversed in the sense that address-line-2 only makes sense if
address-line-1 has a value. It's the same: address-level-3 only makes sense
if address-level-2 has a value.


>
> One question is whether the current "locality", which is defined as
> "City, town, village, post town, or other locality within which the
> relevant street address is found", should map to 4 or 2. If it maps to 2,
> we'll probably have to change the way we define this to be more generic.
>

Yes, that's why I'm suggesting we stop trying to create tokens which are
defined by real-world corollaries. If you ask my mom, Beijing is a city,
but technically it's a province-level city, so it's currently a "region"
rather than a "locality" despite the description of locality. So it's
easier to just say "address-level-1" which winds up being either a province
or a province-level city, or a state in the US, or whatever elsewhere in
the world.


>
>
> > > But maybe we can do better, and just have dedicated names. What
> > > countries need more than two, today? How many do they each need? What
> > > are they? If we had hard data here it might be easier to design a
> > > better solution; do you happen to have that data?
> >
> > At least Korea, China, and Thailand need the third level. I think China
> > will need a 4th soon. Here's a rundown for Chinese administrative levels:
> > http://en.wikipedia.org/wiki/Administrative_divisions_of_China
> >
> > The three that make it onto the envelope currently are:
> > "Provincial level"
> > "Prefectural level"
> > "County level"
> >
> > You can click through on the wikipedia link for explanations of the
> > various forms these levels take.
> >
> > I don't think dedicated names are advisable given the wide variety of
> > names for each address level (even within a single country, much less
> > across all countries). For example, "region" is already super generic
> > and unhelpful.
>
> Being generic is kind of the point, since as you point out, different
> countries have different levels.
>

It's generic in the wrong way. As far as the English language is concerned,
a city is a region, a state is a region, a county is a region, etc.


>
>
> > Is there a name for these fields that you think would be less confusing
> > to the authors?
>
> It sounds like we could have country-name, region, locality, province, but
> I agree that at the end of the day it's just confusing to have four words
> that are so vague that you can't tell what order they go in.
>
> Still, having 1,2,3,4,3,2,1 is kinda weird.
>
> Here's some dumb ideas. We could extend "address-line", as follows:
>
>    "address-line1" |
>    "address-line2" |- "street-address"
>    "address-line3" |
>    "address-line5"
>    "address-line6"
>    "address-line7" / "locality"
>    "address-line8" / "region"
>    "address-line9" / "country-name"
>
> This leaves one unused number in the middle (4), in case we need to add to
> the street address side or the locality side.
>
> Or we could do:
>
>    "address-line1" |
>    "address-line2" |- "street-address"
>    "address-line3" |
>    "subsublocality"
>    "sublocality"
>    "locality"
>    "region"
>    "country-name"
>
> ...or, similar, but extending region instead of locality:
>
>    "address-line1" |
>    "address-line2" |- "street-address"
>    "address-line3" |
>    "locality"
>    "subsubregion"
>    "subregion"
>    "region"
>    "country-name"
>
> We could make "region" into a multi-line field like "street-address":
>
>    "address-line1" |
>    "address-line2" |- "street-address"
>    "address-line3" |
>    "locality"
>    "region-line1" |
>    "region-line2" |- "region"
>    "region-line3" |
>    "country-name"
>

the problem with this one is that address-lineN is defined as the "Nth line
of the street address", whereas the different levels of region are not
necessarily split onto different lines (depending on the country of
course). So it's a false analogy to think of these different indexed things
as the same. In the US case, region-line1 would be "Los Angeles, CA" but in
reality almost all sites want this tokenized into "Los Angeles" and "CA".


>
> Or alternatively:
>
>    "address-line1" |
>    "address-line2" |- "street-address"
>    "address-line3" |
>    "region-level5"
>    "region-level4"
>    "region-level3" / "locality"
>    "region-level2" / "region"
>    "region-level1" / "country-name"
>
> Compared to those, the main proposal here doesn't seem that much better
> necessarily:
>
>    "address-line1" |
>    "address-line2" |- "street-address"
>    "address-line3" |
>    "address-level4"
>    "address-level3"
>    "address-level2" / "locality"
>    "address-level1" / "region"
>    "country-name"
>
> I dunno. Anyone else want to try to pick a colour for this bikeshed?


Again, you really can't just put a stack of input fields and have it make
sense anywhere. If you are presenting a UI to enter addresses, there's no
way you can escape actually knowing how addresses are formatted around the
world. (Well, there's requestAutocomplete.) So looking at the naming scheme
in the context of wanting to just stack up a bunch of <input>s is not
informative.

I'm not married to the address-levelN name. [something-that-makes-sense]N
is fine. The reason we went with proposing address-levelN is because
region-levelN implies that all political regions are captured, when they
aren't. There's no field for US county because county is never part of an
address. So it's only for regions that actually make it onto an envelope.


>
>
> > > > > Are we going to have a list in the spec giving how many levels
> > > > > should be given for each country?
> > > >
> > > > No. That is up to the site's ability to handle the data. For
> > > > example, if I'm soliciting *just* US addresses, I wouldn't know what
> > > > to do with address-level3, hence I won't ask for it.
> > >
> > > Ok. What do you do if you're soliciting addresses from any country?
> >
> > I put all the fields my database or payments backend or w/e can handle.
> > If there's no column for address-level-4 in my database, I don't put a
> > field for address-level-4 in my webpage.
> >
> > Then I hide them all and invoke requestAutocomplete. Or I write
> > complicated JS to manipulate my markup to show the user what they expect
> > to see based on which country they're entering info for (hide the fields
> > that don't make sense, mark "required" for the ones that are necessary,
> > etc.)
>
> requestAutocomplete() is a proprietary Chrome thing right now, so we
> shouldn't be recommending that people use it. (I'd love for other browsers
> to pick it up, since I agree that it makes things like this WAY better.
> But that's academic until they do.)
>
> Similarly, I think requiring "complicated JS" is a too-high barrier for
> many authors, at least if we don't give explicit advice as to what this JS
> should do.
>
> Hence the question, what should authors do if they're soliciting addresses
> from any country, if we don't tell them what this "complicated JS" is to
> do?
>
> --
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
>

Received on Monday, 3 March 2014 22:53:49 UTC