Re: abbr and acronym from Christoph Päper on 2007-03-26 (www-html@w3.org from March 2007)

From: Christoph Päper <christoph.paeper@crissov.de>
Date: Mon, 26 Mar 2007 21:36:48 +0200
To: www-html@w3.org
Message-Id: <8EFF0A6E-F43B-4E53-A32F-5EA228D74539@crissov.de>
Leandro Guimarães Faria Corcete DUTRA:
> Em Sáb, 2007-03-24 às 17:46 +0000, Patrick H. Lauke escreveu:
>> All acronyms are just a subset of abbreviations, so there's no  
>> need for an extra element.

Your assertion is correct, but the conclusion not necessarily so.

Abbreviations can be expanded, spelled out or pronounced as a word  
and sometimes these possibilities are mixed (as in 'MPEG').

> Yes, but acronyms are pronounced as words, while all-uppercase
> non-acronym abbreviations are spelled out.

The possibility of pronunciation is not always considered mandatory  
for acronyms, it is also not always unambiguous. Sometimes all  
abbreviations that are usually not expanded upon reading aloud are  
called acronyms. The terms acronym and initialism are often used for  
distinction, but with interchanged meanings among users.

> So with different acronym and abbr elements a speech the user agent  
> will know what to do, roughly — obviously hybrids (eg JPEG) will  
> need special treatment.

There is a plethora of different kinds of abbreviations. Many of them  
do not even have an established distinct linguistic term yet. See  
<http://en.wikipedia.org/wiki/Acronym_and_initialism#Examples> for a  
few examples to begin with.

> For example, NATO and HTTP.  The first is an acronym, the second  
> only an abbreviation.  Am I getting anything wrong here?

That depends on whom you ask. For me both are initialisms and at  
least 'NATO' is an acronym too, whereas 'HTTP' might be called a / 
alphabetism/. I use the HTML element |acronym| for both nevertheless,  
and |abbr| for "expanding" abbreviations. This distinction is also  
the base for my personal style guide regarding periods, because  
'Prof.' vs. 'Dr' or 'Ph.D.' vs. 'MD' and 'U.S.' vs. 'USA' and 'UK' is  
just ridiculous; the exception are unit symbols, metric ones at least.

I do not mark up elisions (incl. syncopes, "isn't", and aphaeresis,  
"'Tis"), i.e. "apostrophe abbreviations", and blends (aka.  
portmanteaux, 'smog') and apocopations ('advertisement' -> 'advert' - 
 > 'ad') neither, which both have no periods.

NB: I like neither French-style abbreviating (first and last letter 
[s], 'Mrs.') nor Latin-style plurals (letter repetition, 'p.' ->  
'pp.'), but I'm stuck with those when I want to use them, but I can  
and will avoid British-style period omission, because abbreviations  
should be marked by more than an unusual letter combination, i.e. by  
either capital letters or periods, preferably not both and one period  
often suffices ('btw.', not 'b.t.w.' or 'BTW').

A work-in-progress approach to categorise abbreviations:

  - expanded ('abbr.') ...
    - ... from foreign ('i.e.')
    - ... from scientific code ('H2O')
    - ... from a symbol ('µm/s')
    - ... with inherent plural ('USA')

  - pronounced ...
    - ... as a word ('SCSI')
      - that already exists ('USA PATRIOT Act')
      - that is not recognised to be an acronym anymore
        ('spam', actually a blend; "/anacronym/")
    - ... as a word /or/ names of letters ('URI')
    - ... as a combination of names of letters and a word ('MPEG')
    - ... as the names of letters ('CSS')
      - phonetically
      - that sound like words or syllables ('XML')
        - including digits ('B2B')
        - pseudo-acronyms ('ICQ')
      - for disambiguation ('US')
      - with a numerical shortcut ('IEEE')
        - incorporated into abbreviation
          - digits for number of left-out letters ('I18n')
          - digits for repetition of letters ('W3C')
          - digits representing certain codes ('9-11')
    - recursive ('GNU') or multidimensional ('GTK+')
    - without expansion [anymore] ('DVD')

  - formed of ...
    - only initial letters ('laser', 'MP')
      - all initial letters ('scuba', 'POTUS')
    - initial and non-initial letters ('radar')
      - uppercased ('ID')
    - uppercase and lowercase letters, the latter for ...
      - non-initial letters ('PhD')
      - words lowercase in titlecase ('RfC')
    - letters by sounds for words ('ICU') or syllables ('XML')
      - pseudo-acronyms ('ICQ')
    - digits (/numeronym/)
      - multiplying letters ('W3C')
      - counting omitted letters ('I18n')
      - representing sounds ('B2B')
      - representing codes ('9-11')
    - codes or symbols ('Y2K', 'H2O')
    - syllables or morphemes ('Interpol', 'hifi')

And this was only for the English language and the Latin script.
Received on Monday, 26 March 2007 19:45:41 UTC