[whatwg] A mechanism to improve form autofill

[This was a long thread. I have omitted comments that repeated earlier 
points and those that made good points that were addressed in detail by 
earlier answers by other people on the thread. If you had a specific 
comment that did not receive a reply from me but for which you would like 
specific reply, please don't hesitate to mail the list again asking for 
clarification.]

On Thu, 15 Dec 2011, Ilya Sherman wrote:
>
> Current autofill products rely on contextual clues to determine the type 
> of data that should be filled into form elements. Examples of these 
> contextual clues include the name of the input element, the text 
> surrounding it, and placeholder text.
> 
> We have discussed the shortcomings of these ad hoc approaches with 
> developers of several autofill products, and all have been interested in 
> a solution that would let website authors classify their form fields 
> themselves. While current methods of field classification work in 
> general, for many cases they are unreliable or ambiguous due to the many 
> variations and conventions used by web developers when creating their 
> forms:
> 
>   + Ambiguity: Fields named "name" can mean a variety of things, 
> including given name, surname, full name, username, or others. Similar 
> confusion can occur among other fields, such as email address and street 
> address.
> 
>   + Internationalization: Recognizing field names and context clues for 
> all the world’s languages is impractical, time-intensive, and 
> error-prone (as good context clues in one language may mean something 
> else in another language)
> 
>   + Unrelated Naming: Due to backend requirements (such as a framework 
> that a developer is working within), developers may be constrained in 
> what they can name their fields. As such, the name of a field may be 
> unrelated from the data it contains.
> 
> We believe that website authors have strong incentive to facilitate 
> autofill on their forms to help convert users in purchase and 
> registration flows. Additionally, this assists users by streamlining 
> their experience.

This does seem to be a valid problem.


On Thu, 26 Jan 2012, Ilya Sherman wrote:
> On Thu, 26 Jan 2012, John Tamplin wrote:
> >
> > One question -- how does this fit with, say, a <select> element for 
> > country that shows localized country names?  Can it be "autocompleted" 
> > as well?  If so, does it match on the localized names, the value 
> > (which might be an ISO 2 or 3-digit code, or something unique to the 
> > app), or what?
> 
> I would expect that this varies across autofill agents.  For Chrome, we 
> handle this case pretty well by leveraging the ICU library [ 
> http://site.icu-project.org/ ].

<select>s are an interesting problem, because they limit the valid inputs. 
So do features like maxlength="", pattern="", etc. I suppose the solution 
is to allow the user agent to have multiple possible autofill values for 
each autofill field, or to just allow the user agent to make suggestions 
based on any information it may have... Actually we pretty much have to do 
that anyway. There's not really any UA-testable conformance criteria for 
this feature, it's just providing information for the UA to make more 
intelligent editors possible.


On Sun, 22 Jan 2012, Mounir Lamouri wrote:
> 
> Looking at the list of types you are proposing, I was wondering if we 
> couldn't solve this another way. We could create two new input types: 
> 'contact' (or person, or anything better) and 'address'. 'address' could 
> even be part of 'contact' given that is a contact information. There is 
> currently some work being done to access contact information.
> 
> Currently, the way we handle names and addresses is probably suboptimal 
> and websites have to re-invent a lot of things. A simple field that 
> would ask the user to give those information might be way better for 
> both authors and users. For example, on a mobile phone, I would be able 
> to pick up one of my contact address to send him a gift instead of 
> typing it.

I have no objection to us introducing new input types, but that seems 
orthogonal to the issue of introducing a way to label a field as being a 
certain piece of data.

That is, "address" is a type; "shipping address" is a field. A fax 
number, mobile number, home number, work number: they're all "tel", but 
they're each different fields.

Anyway, if we want to experiment with higher-level types like "address" 
then I recommend interested vendors implement such ideas in their dev 
versions as e.g. type="x-address". I think that's somewhat orthogonal to 
what we have in the realm of autofill -- existing pages and database 
schemas are based on certain fields, which is presumably what we should 
expose. But similarly, we shouldn't preclude this being added in the 
future. This suggests that maybe we should have a field name "address" 
that encompasses all the various address-related fields into one. Might be 
best to leave that out until we have such an input type though.


> Other input types like 'email' and 'tel' should be able to solve the 
> phone, fax and email autocompletetypes I believe. Authors should stop 
> using multiple fields for telephone and use <input type='tel'> instead. 

I don't think it makes sense to have different input control types for 
work phone numbers vs home phone numbers. They're both phone numbers.


> Also, I do not understand why we have credit cards types. Is anyone 
> willing to have his credit cards information saved locally?

Sure, why not?


> Is any website not using autocomplete=off as soon as credit cards are 
> involved?

Those that do that are incredibly annoying.


On Thu, 26 Jan 2012, Ilya Sherman wrote:
> 
> Extending the existing input 'type' attribute is an interesting idea, 
> thanks for raising it.  Looking through the existing input type values, 
> it seems they are primarily chosen so as to enable user agents to render 
> and format the input data in type-appropriate ways.  However, the 
> existing types do not try to nail down the field's exact data type 
> beyond the needs of this use case -- for example, <input type='tel'> 
> currently covers both phone and fax fields.  In contrast, for many 
> autocomplete/autofill agents, the distinction between phone and fax 
> fields is important.

Indeed.


> One possibility -- also suggested by Kornel Lesiński on a separate 
> thread -- would be to simply permit all of the attribute values from the 
> 'autocompletetype' proposal as values for the 'type' attribute.  This 
> avoids introducing a new attribute, but co-opts the 'type' attribute.  
> My guess is that people would object to co-opting the 'type' attribute 
> in this way, but perhaps I am wrong...

Yeah I don't see that as a viable option really. They're orthogonal 
concerns.


On Wed, 15 Feb 2012, Nathan Ziarek wrote:
>
> ...as I'm in the middle of a project implementing schema.org markup, is 
> there any consideration of using those properties as the tokens within 
> the autocomplete attribute?
> 
> While not perfectly compatible, it would lessen the burden on 
> developers—"Is it given-name or givenName for form fields?"—and 
> schema.org's notion of scoping might prove a better solution than the 
> section- nomenclature. After all, form elements already have a grouping 
> element in <fieldset>.

On Wed, 15 Feb 2012, Ilya Sherman wrote:
> 
> The current set of tokens was chosen to match the hCard [ 
> http://microformats.org/wiki/hcard ] naming as much as possible. 
> Unfortunately, it's not possible to match all of the major naming 
> schemes without bloating the token space with lots of redundant tokens.

Reusing vocabularies is good where possible (so long as it doesn't 
interfere with solving the problems), but yeah, we can't really match more 
than one at a time.

Other vocabularies of relevance, in case anyone wants to study them: OASIS 
xAL, vCard, Apple AddressBook, ECML, Microsoft's Windows APIs.


On Fri, 20 Jan 2012, Ilya Sherman wrote:
> 
> One suggestion, brought up by Simon Pieters, is to remove the 
> discouraged field types (e.g. 'phone-local-prefix').  As I mentioned on 
> the other thread, the tradeoff with these is supporting existing 
> websites vs. trimming the official list of tokens and encouraging best 
> practices.  Both lines of reasoning seem fairly sensible to me, and I'd 
> love to get some insights from folks on which approach is more likely to 
> work better for establishing a good specification.

Yeah, it seems better to address existing Web apges.

In a conversation on #chromium, Ilya linked me to this directory:

   https://src.chromium.org/viewvc/chrome/trunk/src/chrome/test/data/autofill/heuristics/input/

This is a list of sample forms derived from a big study of real-world 
sites, which is very useful.


On Wed, 25 Jan 2012, Kornel LesiÅ~Dski wrote:
> 
> Should <input type="text" autocompletetype="email"> behave just like 
> <input type="email">? Similar ambiguity exists for <input type=text 
> autocompletetype=phone-full> and <input type=tel>.
> 
> Why not fold autocompletetype types into the existing type attribute (or 
> autocomplete attribute)? Type could be redefined as space-separated 
> list, so <input type="cc-full-name name-full section-billing"> could 
> work just like autocompletetype. It would be backwards compatible with 
> HTML5 types and fall back to text for new types or lists.

On Thu, 26 Jan 2012, Kornel LesiÅ~Dski wrote:
>
> But even if single-mixed-login-field autocomplete was desired, then 
> perhaps a mixed type would work too:
> 
> <input type="username email">
>
> How about merging autocompletetype with autocomplete then?
> 
> It looks sensible to me:
> 
> <input autocomplete=off> <input autocomplete=email>

Yeah, using autocomplete="" in this way makes a lot of sense I think.


Studying the forms in the listing cited above, it seems that fields fall 
into these categories:

Separate forms all found in the same <form>, e.g. for pages that contain 
multiple products each with their own set of fields, only one product of 
which is shown at a time. At a high level, the use agent should treat each 
of these as a separate <form> for autofill purposes.

Each of these can have information for different people or facets of 
people:
 - shipping information
 - billing information
 - generic user information (e.g. when it's not a shipping order form)

Each of these sections can then have subinformation:
 - name (and its subfields, such as "honorific-prefix", "nickname", etc)
 - "organisation" name, the user's "organisation-title"
 - physical address (and its subfields, such as "city", "state", etc)
 - contact information category, e.g. "home", "work", "cell", "fax"
    - each of which has subinformation such as "email", "tel" (and their 
      subfields, such as "country-code")
 - credit card details (and subfields such as "name", "exp" etc)
 - personal information (such as "bday", "url", "photo")

So we could define the autocomplete="" field's value as follows:

   "on", "off", or:
   [section] [subsection] [generic-field | [contact-type] contact-field]

...where

   section       = high-level section name; author-defined string starting
                   with the prefix "section-"
   subsection    = "shipping" or "billing"
   generic-field = one of: name, honorific-prefix, given-name, 
                           additional-name, family-name, honorific-suffix,
                           nickname, organisation-title, organisation,
                           street-address, address-line1, address-line2,
                           address-line3, locality, region, country, 
                           postal-code, cc-name, cc-given-name, 
                           cc-additional-name, cc-family-name, cc-number, 
                           cc-exp, cc-exp-month, cc-exp-year, cc-csc, 
                           language, bday, bday-day, bday-month, 
                           bday-year, sex, url, photo
   contact-type  = "home", "work", "cell", or "fax"
   contact-field = one of: email, tel, tel-country-code, tel-national,
                           tel-area-code, tel-local, tel-local-prefix, 
                           tel-local-suffix, tel-extension, impp

...with some conformance rules, so that each section/subsection and 
section/subsection/context-type group has:

 - either up to one "name" or up to one of each of "honorific-prefix",
   "given-name", "additional-name", "family-name", "honorific-suffix"

 - up to one "organisation-title"
 - up to one "organisation"

 - either one "street-address", or one "address-line1"
 - up to one "address-line2", but only if there is an "address-line1"
 - up to one "address-line3", but only if there is an "address-line2"

 - up to one of each of "locality", "region", "country", "postal-code"

 - either up to one "cc-name" or up to one of each of "cc-given-name", 
   "cc-additional-name", "cc-family-name"

 - up to one "cc-number"

 - either up to one "cc-exp" or up to one each of "cc-exp-month" and 
   "cc-exp-year"

 - up to one "cc-csc"

 - up to one "language"

 - either up to one "bday" or up to one each of "bday-day", "bday-month", 
   and "bday-year"

 - up to one "sex"
 - up to one "url"
 - up to one "photo"

 - up to one "email"

 - either up to one "tel" or up to one each of "tel-country-code" and 
   "tel-national"

 - if there is no "tel" and no "tel-national", up to one each of: 
   "tel-area-code" and "tel-local"

 - if there is no "tel", no "tel-national", and no "tel-local": up to one 
   each of "tel-local-prefix" and "tel-local-suffix"

 - up to one "tel-extension"

 - up to one "impp"


The UA conformance criteria would be pretty minimal: for each input 
control with an autocomplete value that matches the above long forms, try 
to determine a value that matches the description of that value (the spec 
would have prose and a table describing what the values mean), and 
optionally offer to set the control to that value. The values must pass 
all the form control validation stuff, so e.g. if a control has 
maxlength=1 and autocomplete="shipping additional-name" then the only 
sensible value to offer is the middle initial of the person to which the 
user is intending to ship the product.

The documentation in the spec would recommend particular input types for 
each field, and discourage the use of the decomposed forms, but there 
would not be any conformance criteria there.

Are there any common fields missing from the list above? Any categories 
other than "billing" and "shipping" that should be listed? Anything other 
than "work", "home", and "fax"? Should it be "work-fax" and "home-fax"? 
Should we instead have the fax-* fields to parallel the "tel-*" fields, so 
you can say you have a cell fax and so you can't say you have a fax e-mail 
or fax Jabber? Does it make sense to have home and cell e-mail accounts 
separately specifiable? Should we disallow addresses and contact details 
without the "shipping"/"billing" labels?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Monday, 23 July 2012 23:42:11 UTC