Re: abbr and acronym from Jukka K. Korpela on 2007-04-02 (www-html@w3.org from April 2007)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Mon, 2 Apr 2007 08:50:46 +0300 (EEST)
To: www-html@w3.org
Message-ID: <Pine.GSO.4.64.0704020826001.29843@mustatilhi.cs.tut.fi>
On Mon, 2 Apr 2007, Nicholas Shanks wrote:

> Two comments [on the read="..." attribute]
>
> I use <ssml:phoneme> elements in personal documents, which are manipulated 
> with XSLT before being sent to a speech engine, one option is to use 
> something compatible with that.

People have expressed their concern about the interpretation of the 
attribute value. Defining it as IPA notation would be theoretically 
promising, but for most cases, that would be overkill. Usually it is 
sufficient to specify the pronunciation the same was as the language of 
the content of the element is written. So for example, if we wanted to 
specify that "I" is to be read as a Roman numeral in some context, we 
could write <span read="the first">I</span> in an English document and
<span read="der erste">I</span> in a German document. This, by the way, 
might help automatic translation as well: it could know or guess that when 
translating from English, such an "I" is to be kept as is and not 
interpreted as a personal pronoun.

The vast majority of authors doesn't know IPA, or knows it passively only 
(can follow pronunciation instructions in IPA notations but not write 
them). Besides, it wouldn't be a bad idea for a graphic browser to give 
users an optional access to read="..." attributes, e.g. the way Firefox 
lets you right-click on anything and select Properties, to see the 
language of the element (as declared in markup), its advisory title if 
present, its destination if it's a link, etc. In such usage, plain 
language is more useful to most people than IPA.

As a policy issue, according to what I have understood from Unicode list 
discussions, many experts think that IPA should not have a special status 
among phonetic writing systems. There are other systems in use, even 
though IPA is the most common in linguistics.

Thus, the pronunciation should be specified the same way as you would do 
in normal text. For example, if you would like to specify the 
pronunciation of some foreign word, you would try to write it according 
to the rules of the document's language. This indicates the 
pronunciation very coarsely, but often in a useful way.

If an _additional_ attribute is defined for the purpose of giving 
pronunciation instructions, it might use IPA by definition, or it might be 
defined as using _some_ phonetic notation, to be defined separately 
(though this admittedly seems to result in some attribute spaghetti).

> Also "read" has the problem of not knowing what tense it is (could be 
> homophonous with reed or red). May I suggest "pronounce" as an alternative 
> attribute name?

Generally, verbs should be avoided in attribute names, since in a logical
markup language, attributes are supposed to indicate properties or 
relationships, not actions. We have the precedent align="...", but it
has been condemned to deprecation. In HTML 4.01 there's also accept="...",
defer="...", and others.

In this case, the word "read" would really reflect the _meaning_ of the 
attribute: how the content is read, or is to be read. Admittedly people 
could read it two ways when pronouncing it, but I don't think that's a 
serious problem. Most people would read it as if it were an imperative, 
which doesn't sound like descriptibe markup, but so what? And pronouncing 
attribute names isn't the main use of those names, and it's not part of 
the attributes meaning. When writing about them in HTML, you could always 
express your preference by using <code read="reed">read</code> or <code 
read="red">read</code>. :-)

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Monday, 2 April 2007 05:50:56 UTC