Re: Words vs. context (was Re: "ACRONYM")

Rob (wlkngowl@unix.asb.com)
Tue, 29 Jul 1997 19:57:19 -0500


Message-Id: <199707300010.UAA19521@unix.asb.com>
From: "Rob" <wlkngowl@unix.asb.com>
To: wahlen@ph-cip.Uni-Koeln.DE (Holger Wahlen)
Date: Tue, 29 Jul 1997 19:57:19 -0500
Subject: Re: Words vs. context (was Re: "ACRONYM")
CC: www-html@w3.org


I have many comments:

This is a little long, so a summary of thoughts/issue is:
* what is the use of ACRONYM, ABBREV, and PERSON?
   (when and why use them...)
* what about a more general element for noting proper names instead
   of PERSON or AU?
* how is the TITLE attribute (not element) to be handled?
* how to handle multiple instances of ACRONYM, ABBREV, etc.?
* how to implement a dictionary

On Tue, 29 Jul 1997 wahlen@ph-cip.Uni-Koeln.DE (Holger Wahlen) wrote:

> It has been argued afterwards that this is something
> presentational and hence more suitable for CSS, so that it
> would be more appropriate to deal with this in the way
> "CLASS=spellout" instead - okay. The question I'd like to
> [..]

Note the advantage of elements like ABBREV, ACRONYM and PERSON is 
that they are useful for search engines and indexing, and not just 
for speech synthesis or providing footnote-like information.

A document that contains

  <ACRONYM TITLE="Hypertext Markup
    Language">HTML</ACRONYM>

or

  <ABBREV TITLE="International">Intl.</ABBREV>

will match in searches using the expanded text even if only the 
acronym or abbreviation is shown in the presentation of the document.

They are also useful as signals to indexers that crunch documents 
into word stems, so that

  <PERSON>Fred Baily</PERSON>

isn't saved in an index or confused with as "Fred Bail" (ideally,
anyway). It also allows one to search specifically for people, not
just words that happen to be proper names. (A search engine can give 
extra weight to matches inside <PERSON> elements when told to look 
for people.)

The problem with PERSON and AU (for author) is that they are limited 
in scope. A more general element for noting proper names (people, 
places, institutions, organizations, even organizational or social 
titles and positions) with subtypes is a better idea.

An issue is how browsers treat TITLEd elements. Presumably they're
highlighted (though this can be optional and set using CSS) and
whenever the pointer/cursor is over the titled element a pop-up or
advisory window comes up. (Of course, if the major browsers do not
support this feature, a majority or authors won't use it.)

Should speech synthesizers (always) read the TITLE rather than the
contents, though? It coulld get wordy to always render "HTML" as 
"Hypertext Markup Language", especially when in common speech one 
says "HTML".

Another issue is how to handle multiple instances of abbreviations, 
acronyms, foreign terms, proper names, etc. using dictionaries (that 
is, special site- or document-specific dictionaries; we'll assume 
common acronyms like "radar" need no special markup).

LANG attribute helps for (some) foreign terms and proper names when
rendered into speech.  But how to tell the browser that an acronym is 
spelled out or pronounced? As much as it is a CSS issue, perhaps the 
use of a SPELLOUT attribute makes sense after all. It tells UAs as 
well as indexers that the "word" is actually initials when pronounced 
(not just for acronyms but call letters-- a proper name that is 
pronounced as reading the letters, such as names of broadcast 
stations or ships or license plates, by the way...)

So then another issue is to define a site- or collection-wide 
"dictionary" that gives translations of uncommon/techical acronyms 
and abbreviations or call-letters and pronunciation schemes.

I think something simple that can be placed in the HEAD section of a
HTML document or a separate file (using the LINK element)  and is
friendly towards older browsers.  Dictionary entries need to do at
least the following:

* Note whether the name is spelled out or read as a word
   or a pronunciation guide
* An optional LANG identifier 
* A "translation" of an acronym or abbreviation
* A type indicator as to whether it is an acronym/abbreviation,
   term, keyword, proper name (person/place/thing) or keyword
* A way to differentiate between overlapping acronyms or
   abbreviations (ie, is "St." = "Street" or "Saint"?)
* An optional HREF for more information or gloassary entry
   (an advantage is one need not make hyperwocky by constantly
   putting a link every time a proper name or keyword occurs)

Presumably, proper names, call letters, and acronyms won't be 
stemmed by indexers.  They'll be read "as is" or using the 
pronunciation guide.  Only abbreviations would be "translated" unless 
otherwise indicated.

A separate menu in a browser could bring the reader to the
dictionary (or for readers, one can ask "what is ..."?) to see
definitions of abbreviations/acronyms or check a glossary entry.

An issue: how to deal with abbreviations/acronyms/names that 
reference each other. "New York City" would reference information 
about it, and "NYC" would reference "New York City".

Perhaps not allowing forward references would prevent circular 
definitions.


Rob
 
---
Robert Rothenburg Walking-Owl (wlkngowl@unix.asb.com)
Se habla PGP.
http://www.asb.com/usr/wlkngowl