Automatic form entry: a survey

I'm working on a more comprehensive version of my proposal for
automatic form fill-in. In an effort to better understand the vagaries of
field names in forms I undertook a simple survey. I somewhat randomly
selected 51 forms from three Alta Vista searches (for "form" in the URL,
for "register" in the title, and for "fill out" in the body text). I apologize if
the results are annoyingly long, but I think they're illuminating and worth
posting (and not as annoyingly long as quoting an entire digest ;-).

I grouped the fields by class (name, city, state, etc.), and after each
class I included the count and percentage. I also indicated what
percentage of the field names were unique. I collapsed duplicate field
names to a single line with the frequency in parentheses.

Almost all forms (96%) asked for a name, and 90% asked for email.
There's a surprisingly high uniqueness of field names, from 40% to
100%, meaning that approximately every other time you visit a form any
given field of one class will have a different name than on all the other
forms you've visited (uniqueness would be lower with a larger sample,
but the point is still valid). The variety of names for a field as
straightforward as email is truly amazing.

This is all quite unscientific but it shows that trying to identify fields
without a consistent identification scheme is a daunting task. And it at
least provides some ballpark numbers for determining which fields are
common enough to be included in a "global" set of field identifiers.

Name: 40 (78%), 43% unique
  010.name
  comments_name
  custname
  customer-name
  feed_name (2)
  field
  from_name
  Full Name
  invoice_customer_name
  name (16)
  OName
  realname (5)
  rem_user
  sales-name
  thename
  username
  your_name (3)

First Name: 9 (18%), 67% unique
  a firstname
  first name
  firstname (3)
  fname
  FNAME
  name (2)

Middle Name/MIddle Initial: 4 (8%), 100% unique
  initials
  mi
  midname
  MNAME

Last Name: 9 (18%), 67% unique
  b lastname
  last name
  lastname (4)
  LNAME
  lname
  surname

Email: 46 (90%), 54% unique
  170.email_address
  comments_email
  curremail
  customer_email
  E-mail
  e-mail address
  email (16)
  email-address
  emailadd
  emailadd1
  emailAddr (2)
  email_addr
  EMAIL_ADDRESS
  email_address (2)
  feed_email (2)
  field
  From
  from_address
  internet
  l email
  mail
  PipeToReplyTo
  theemail
  username (4)
  your_email

Title: 7 (14%), 100% unique
(also used as title of book)
  c title
  current
  job_name
  occupation
  pos
  title
  your_title

Company: 17 (33%), 59% unique
  address
  company (7)
  company_name
  compname
  d company
  institution_name
  name (2)
  ORGANIZATION
  ORGNAME
  thecompany

Address (1 address field): 21 (41%), 52% unique
  090.mailing_address
  Addr (2)
  address (9)
  Address/P.O. Box
  card_address
  comments2
  e address
  geographicLoc
  newnames
  street (2)
  street_address

Address1 (2 address fields): 11 (22%), 82% unique
  addr1
  address (3)
  address1
  Adress (Line 1)
  cust-address
  streetaddress1
  street_1
  street_address
  theaddress1

Address1 (2 address fields): 11 (22%), 91% unique
  addr2
  Address
  address2 (2)
  Adress (Line 2)
  cust-suite
  Street
  streetaddress2
  street_2
  street_address_2
  theaddress2

City: 30 (59%), 43% unique
  100.mailing_city
  Address
  card_city
  city (18)
  city1
  city_name
  city_state_zip
  Cty
  cust-city
  f city
  thecity
  town
  www_city

State: 28 (55%), 50% unique
  110.mailing_state
  Address
  card_county
  city
  city_state_zip
  cust-state
  g state
  region
  St
  state (15)
  State/Prov
  State/Province
  state_name
  thestate

Zip/Postal code: 28 (55%), 54% unique
  130.mailing_zip
  Address
  card_postcode
  city
  city_state_zip
  cust-zip
  h zip
  postal_code
  thezip
  zip (14)
  zip-5
  Zip/Postal Code
  Zip/Postal_Code
  ZIP1
  zip_code

Country: 22 (43%), 41% unique
  120.mailing_country
  card_country
  CNTRY
  country (14)
  country_name
  i country
  nation
  OCont
  thecountry

Phone (if 1 phone field): 24 (47%), 46% unique
  150.current_phone
  comments_phone
  cust-tel
  j phone
  OPhone
  OTPhone
  Ph
  phone (12)
  phone_number (3)
  telephone
  WKPHPRE+WKPHFIRST+WKPHLAST (3 fields)

Phones (if 2 phone fields): 5 (10%), 100% unique
  dayphone/evephone
  homephone/workphone
  Office_Phone/Home_Phone
  phone/phone2
  phone_day/phone_eve

Fax: 15 (29%), 40% unique
  160.fax
  fax (10)
  FAXPRE+FAXFIRST+FAXLAST (3 fields)
  fax_number
  k fax
  thefax

Username/Login: (6%)
  login (2)
  username

Password: 4 (8%)
  password
  password/password2
  password1/password2
  rem_word

Payment/Credit Card: 10 (20%)
  (many complex options, I can do a more detailed summary if there's any
interest)

Computer Platform: 4 (8%)
  (many complex options)

Other fields not highly specific to a form:
  Nickname, Department, Courtesy Title (Mr./Ms./Dr./...), Web Page Name,
Web Page URL, Social Security Number, Age, Date of Birth, Sex,
Religious Preference, Ethnic Background, Country of Citizenship, US
Citizen, Year of Graduation, Language

Security Note: I am not suggesting that the above information be
automatically submitted to every entity that asks for it. I'm merely
reporting my findings.

A few other notes:
- "Atoms" of information are frequently combined (a single field for first
name and last name or for city, state, and zip), which indicates a
possible need for a fill-in mechanism that can combine them
- Context wouldn't help much in identifying fields. I never imagined so
many ways to label a field requesting a name! Not to mention foreign
languages.
- I saw a surprising number of VALUE="" entries. Which aside from being
quite bizarre might cause problems for browsers which don't fill in fields
if a value is specified.
- I discovered that there's a "Comment on ..." form that's been copied to
hundreds of Web sites, complete with odd "inPUT" spelling.

I reiterate that this is an ad hoc survey, so please don't bother to
complain about my math or my methods.   

__________________________________________________________________
Jim Taylor <mailto:jhtaylor@videodiscovery.com>
Director of Information Technology
Videodiscovery, Inc. - Multimedia Education for Science and Math
Seattle, WA, 206-285-5400, <http://www.videodiscovery.com/vdyweb>

Received on Wednesday, 28 February 1996 15:11:45 UTC