W3C home > Mailing lists > Public > public-html@w3.org > April 2008

Re: [whatwg] Feeedback on <dfn>, <abbr>, and other elements related to cross-references

From: Smylers <Smylers@stripey.com>
Date: Mon, 21 Apr 2008 17:20:31 +0100
To: whatwg@lists.whatwg.org, public-html@w3.org
Message-ID: <20080421162031.GD20100@stripey.com>

Jens Meiert writes:

> > The point of <abbr> is to expand the acronym, not to just mark up
> > what is an acryonym or abbreviation.
> 
> Doesn't this claim that the general information that some text is an
> abbreviation (w/o an expanded form) is basically useless?

Well it's very close to being useless.  In that if browsers don't do
anything with some mark-up, there's no point in having it (and indeed no
incentive for authors to provide it).

The point of annotating an abbreviation with its expansion is not to
mark up the abbreviation _per se_; it's to provide browsers with what
the expansion is, so that they can display it.

Sure, all instances of just using abbreviations _could_ be marked up.
Equally we could mark up verbs, proper nouns, words that score over 30
in Scrabble, palindromes, words that can be written upside-down on
calculators, words defined in the Oxford English Dictionary ...

There's almost no limit to how text could be marked up to have _some_
use in a particular niche.  But that isn't what HTML 5 is going to cater
for.

> And is "<abbr>ISS</abbr>" not more useful since less ambiguous than
> "ISS" (same abbreviation) and "ISS" (German imperative for "to eat" in
> capitals)

Yes, that is potentially ambiguous.  But it's the same in books,
newspapers, and so on, where it turns out not to be much of a problem.
Human beings tend to be pretty good at working things out from context.

For example in an article which has previously mentioned the
International Space Station (and possibly also put "ISS" in brackets
after it) readers are going to recognize further uses of "ISS".  Parts
of speech also provide a clue ("iss" being an imperative only makes
sense in certain places in a sentence), as does its being in all-caps --
yes, any word _can_ be written in upper-case, but it's unusual to find
one in the middle of a sentence; humans are used to it being an
indicator of an abbreviation.

Further, distinguishing abbreviations from upper-case-words is far from
the only ambiguity in writing:

* Words are quite capable of being ambiguous on their own, without any
  abbreviations in the vicinity.  For example "entrance" can be the
  place where one enters a building, or the action of putting somebody
  in a trance.

* The same abbreviation is often used for different terms (though often
  in quite distinct fields).  Marking something up as being an
  abbreviation without giving the expansion wouldn't be any use here.

Why should HTML 5 bother to solve the very narrow case of disambiguating
words from abbreviations, but not solve it more generally to include the
other cases?

> and be it just for AT,

(See, you just used "AT" there!  That _could_ be the English word "at"
written in capitals.  It _could_ be a reference to automatic
transmission.  But readers of this list successfully work out what you
were referring to; in practice it isn't ambiguous.)

What in practice would you expect AT to do with this knowledge?
Remember that most abbreviations that aren't being tagged with
expansions won't be marked up, so AT is going to have to deal sensibly
with that case anyway.

> pronunciation

Human languages already have many quirks of pronunciation.  Speaking
browsers have to cope with these heuristically, without help from the
mark-up indicating how to pronounce, say, "entrance".  (As is speaking
software that reads out, say, e-mails or word processor documents --
text which doesn't have any underlying mark-up.)

Also note that an ordinary word such as 'iss' likely shouldn't be in
capitals in the HTML source anyway.  If the capitals are wanted for
emphasis then it should be written <em>iss</em>, with CSS being used to
remove the italics and up-case the text.

Are mis-pronounced abbreviations really a significant proportion of
mis-pronounced words by speaking browsers?

> and a scent of semantics?

And, what would the point of such a scent be?  Why would it be more
useful than the scent provided by tagging all verbs with <verb>?

Smylers
Received on Monday, 21 April 2008 16:21:00 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:38:54 UTC