abbreviations in canonical HTML (thoughts & concrete suggestions)

[composition date: 3/31/2007]

since joining the HTML WG, i've been perusing the mail archive 
of, as well as receiving posts from, public-html@w3.org

i read with great interest the brainstorming thread on 
abbreviation markup in canonical HTML which begins at:

[http://lists.w3.org/Archives/Public/public-html/2007JanMar/0119.html]

this is a topic in which i have a deep intrest and one of the 
issues which i have been prodding the WAI PF (Protocols & 
Formats) WG to address for the better part of a year; so, by 
way of:

(1) adding to the previous discussion and

(2) introducing myself, as dan suggested, to the WG by 
sharing one of the ideas that i am interested in working 
on.

what follows are the main discussion points i have raised 
within the Protocols & Formats (PF) working group, 
particularly through the PF mailing list.

NOTE: whilst NOT endorsed in any way by the PF WG as a body, 
there was general agreement that these were the most urgent 
abbreviation issues which need to be addressed in canonical 
HTML slash XHTML.

the issues i attempt to address below are issues for a wide 
variety of users, and have myriad implications, as has been 
repeatedly noted, for internationalization as well as 
accessibility, not to mention general usability.

the main point in any discussion about abbreviation markup 
is that it provides the user with a level of granularity 
that makes documents more accessible to ALL users; the 
user may choose to ignore them, expand them automatically 
or on demand, but no matter how the abbreviation markup 
is ultimately rendered client-side, there are certain 
attributes which not only enhance human understanding, but 
enable machine differentiation between types of 
abbreviations and their individual characteristics, 
regardless of rendering or implementation.  moreover, a 
more robust for/id association mechanism would allow 
authors to reuse expansions by pointing to an initial 
expansion -- or, preferably, a site-wide

<link type="application/rdf+xml" rel="expansions.rdf" />

thus making it far easier for authors to implement, 
especially if they can do it once and forget about it, 
until prompted by their ATAG compliant authoring tool 
[note 1] to add an association between text contained 
in an abbreviated element, and the site's global 
expansions list, which - ultimately - will lead to 
their wider use, as has been the case with LINKed 
stylesheets...

---- Begin Proposals -----
Canonical HTML/XHTML Needs Initialism Elements

+ POINT 1: abbreviations are abbreviations are abbreviations:

<abbr title="Street">St.</abbr>
versus
<abbr title="Saint">St.</abbr>

is the classic example  in english, as is Dr. - the abbreviations for 
both the words Doctor and Drive.

another obvious example is the french abbreviation for mademoiselle,
which to my ears sounds like "mwlee" when pronounced when using
a screen-reader that doesn't support natural language switching on
the fly, or more often, due to the lack of a lang attribute which 
would trigger natural language switching on the fly:

<abbr lang="fr" title="Mademoiselle">Mlle.</abbr>

+ CONCLUSION 1: abbreviations are therefore needed in canonical 
HTML/XHTML


+ POINT 2: initialisms are initialisms are initialisms:

there is a screaming need for an IABBR element, which would subsume
the acronym element of HTML 4.x and XHTML 1.x

no matter the rules governing the natural language expression of
an initialism they can be sub-categorized by the following REQUIRED
attributes -- additions from those with a wider knowledge of
non-western european languages, feel free to add to the list:

type="acronym"
type="initialism"
type="camelcase-abbr"
type="alpha-numeric"

(i'm not sure we need "alpha-numeric", but will discuss that at a 
later
point in this draft)

IABBR would also require an "expressed-as" attribute, for example:

expressed-as="characters" (originally, expressed-as="letters")
expressed-as="word"
expressed-as="phrase"


+ IABBR EXAMPLES:

IABBR would thus result - in its rudest form - in code such as:

<IABBR type="acronym" expressed-as="word"
title="Visually Impaired Computer Users' Group"
>VICUG</IABBR>

or

<IABBR type="camelcase-abbr" expressed-as="word"
title="SOund Navigation And Ranging">SONAR</IABBR>

or

<IABBR type="camelcase-abbr" title="HyperText Markup Language"
expressed-as="characters">HTML</abbr>

or

<IABBR type="initialism" expressed-as="characters"
title="National Association for the Advancement of Colored Persons"
>NAACP</IABBR>

i suppose that W3C would fall under the "camelcase-abbr" typology,
but am unsure - is there a need for a "alpha-numeric" type, or does
changing the attribute name "letters" to "characters" cover such
alpha-numeric initialisms as illustrated by the following example:

<IABBR type="alpha-numeric" expressed-as="characters"
title="World Wide Web Consortium">W3C</IABBR>

or

<IABBR type="alpha-numeric" expressed-as="characters"
title="The Minnesota Mining and Manufacturing Company"
>3M</IABBR>

but on the other hand, i'm not so sure about such antiquated 
initialisms
such as WWW - would one want that expressed as letters or as 
reflective
of the title, World Wide Web?  does this necessitate another value for
the "expressed-as" attribute, namely, phrase?

<IABBR type="initialism" expressed-as="phrase"
title="World Wide Web">WWW</IABBR>

(open question: is "phrase" a synonym for "title", which is what one
wants expressed in a case such as WWW, as discussed below; if so,
why not just use the value "title" for "phrase" when coding the
"expressed-as" attribute?)

so, in summation, there would be an element IABBR which would include
all known permutations of what we have, up until now, referred to as
being subject to the ACRONYM element, which would contain
REQUIRED attributes, "type", "expressed-as", and "title", to 
semantically
distinguish the type of initialism being expanded, notated, and slash 
or
pronounced slash displayed.


+ OPEN QUESTIONS

1. originally "expressed" was "pronounced", but there was discussion
off-line and on the 2 august 2006 telecon that discussed the use of
adding qname or another analogous, workable solution so as to provide
REAL robust pronunciation guidance WITHIN the IABBR element,
and it is expected that i, janina, lisa, dave pawson and others will
take the lead in contributing to this as-yet-undeveloped aspect of
the IABBR element;

2. is there a need for type="camelcase" AND type="camelcase-abbr"?
is SONAR a contraction of words that comprise a new single word
formed of a camelcased phrase , or merely an abbreviation for
"SOund Navigation And Ranging"?

+ OPEN ISSUES:

1. building a more robust for/id associations for abbreviation 
elements

no matter what form abbreviation and/or initiallism elements take 
in canonical HTML/XHTML, single or multiple abbreviation markup 
needs a strong and elastic "for" slash "id" binding mechanism for 
reuseability's (and the author's sanity's) sake.

the simplest means of strengthening the ABBR element is to use 
the for/id model to associate repeated instances of an ABBR, by 
marking the first instance with the explicit explation, using the 
title attribute, as well as a unique identifier, provided by the 
id atrrtribute.  subsequent repitions of an ABBR thus defined, 
would allow an author or authoring tool to use the for attribute 
to point at the initial expansion for that ABBR, as in the 
following example:


<p>
<ABBR id="a1" title="Doctor">Dr.</ABBR> Suess
wrote children's books.  He lived on Suess
<ABBR id="a2" title="Street">St.</ABBR>, which
had been renamed in his honor; its previous name
being <ABBR for="a1">Dr.</ABBR> Doolittle <ABBR
id="a3" title="Drive">Dr.</ABBR>
</p>

<p>
Suess <ABBR for="a2">St.</ABBR> should not be
confused with Suess <ABBR for="a3">Dr.</ABBR>,
formerly <ABBR id="a4" title="Saint">St.</ABBR>
Patrick's <ABBR id="a5" title="Place">Pl.</ABBR>,
which is the site of <ABBR for="a4">St.</ABBR>
Harold's Methodist Church, whose pastor is the
<ABBR title="Reverend" id="a6">Rev.</ABBR>
<ABBR for="a1">Dr.</ABBR> Paul Bunyon, author
of <CITE>This Pilgrim's Progress</CITE>.

<!-- OK, you get the point;
    by the way, Saint Harold was Saint Patrick's younger brother -->
</p>


a similar for/id binding should be part of the IABBR
element, also, so as to make sense of an article whose
topic sentence is:

The ADA has released an ADA-compliance recommendation
for dentists and their patients with AIDS; a recommendation
that grew out of the work of the AIDS' sub-committee on
safety.

in which the first instance of ADA equals "The American Dental
Association", the second, "The Americans with Disabilities Act";
whilst the first instance of AIDS expands to "Acquired
Immunodeficiency Syndrome" (or, if you prefer, "Acquired
immune deficiency syndrome"), whilst the second use of the
initialism AIDS was to represent the "Association of Independent
Dental Surgeons"


through a robust and elastic definition of the for/id mechanism 
to provide bindings between the abbreviated text and its gloss, an
expansion associated with a particular abbreviation can not only 
be reused, but provide a means of clarification slash 
differentiation in the case of homonymic (identically spelt or 
pronounced) abbreviations.  it would also facilitate a site-wide 
means of associating unique abbreviations with their expansion, 
building upon the example of using LINK to point to an RDF 
assertion document, containing explicit bindings between 
expansions and the abbreviations for which they stand, thereby 
allowing an author to define an abbreviation once and reuse the 
content of the for attribute to provide expansions which could 
then be easily applied site-wide.  and since the assumption seems 
to be that the ideal model is to provide authors with a way of 
constructing semantically sensible markup to contain their 
content, it would translate into a simple interface in an authoring
tool - every time ABBR is invoked for a string of text, the author 
could be prompted to reuse a previously defined expansion, or 
provide a unique exansion, which would then be appended to the 
site-wide expansion resource.

gregory.

Notes:
[note 1] for more about the Authoring Tool Accessibility Guidelines, 
consult:
  * ATAG 1.0 http://www.w3.org/TR/ATAG10
  * ATAG 2.0 (Working Draft) http://www.w3.org/TR/ATAG20

---------------------------------------------------------------------
A conclusion is simply the place where someone got tired of thinking.
                                                      -- Arthur Bloc
---------------------------------------------------------------------
Gregory J. Rosmaita - Gregory.Rosmaita@gmail.com
       Camera Obscura: http://www.hicom.net/~oedipus/
---------------------------------------------------------------------

Received on Monday, 2 April 2007 16:07:53 UTC