W3C home > Mailing lists > Public > public-html@w3.org > April 2008

Re: [whatwg] Feeedback on <dfn>, <abbr>, and other elements related to cross-references

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 23 Apr 2008 01:42:26 +0000 (UTC)
To: public-html@w3.org, whatwg List <whatwg@whatwg.org>
Message-ID: <Pine.LNX.4.62.0804230047180.25764@hixie.dreamhostps.com>

Summary: I've made the title="" attribute on <abbr> optional again.

On Mon, 21 Apr 2008, Jens Meiert wrote:
> >
> > The point of <abbr> is to expand the acronym, not to just mark up what 
> > is an acryonym or abbreviation.
> 
> Doesn't this claim that the general information that some text is an 
> abbreviation (w/o an expanded form) is basically useless?

Pretty much.


> And is "<abbr>ISS</abbr>" not more useful since less ambiguous than 
> "ISS" (same abbreviation) and "ISS" (German imperative for "to eat" in 
> capitals), and be it just for AT, pronunciation and a scent of 
> semantics?

I'm not at all convinced that ATs/pronounciation are a valid use case here 
since they have to handle it even when it's not marked up (the common 
case). And I have no idea what a "scent of semantics" is for. :-)


> And why do we need to change what HTML 4 left "open" anyway in the first 
> place; I'm still not convinced that "indicates" really /needs/ to be 
> replaced by "expands":
> 
>   ABBR: Indicates an abbreviated form (e.g., WWW, HTTP, URI, Mass., 
>   etc.). [1]
> 
> [1] http://www.w3.org/TR/html4/struct/text.html#edef-ABBR

HTML4 isn't really a big influence on HTML5, to be honest.


On Mon, 21 Apr 2008, Smylers wrote:
>
> Why should HTML 5 bother to solve the very narrow case of disambiguating 
> words from abbreviations, but not solve it more generally to include the 
> other cases?

Indeed.


On Mon, 21 Apr 2008, Patrick H. Lauke wrote:
> 
> Assistive technology is certainly a valid use case here.

Why? It doesn't seem to be the case to me that people using ATs are any 
less able to work out what an abbreviation is than anyone else.


> > Yes, that is potentially ambiguous.  But it's the same in books, 
> > newspapers, and so on, where it turns out not to be much of a problem.
> 
> But books etc don't have any other way of providing 
> disambiguation/structure. Under that reasoning, you could argue that 
> there's no need for heading elements etc, as simply having text bigger 
> works fine in print, so all we need is a font sizing markup option.

That's a non-sequitur -- there's _no_ difference with abbreviations. There 
_is_ a difference with headings, so we need to mark those up (be it via 
<h1>, or be it via <font size="+1"> -- the formet is better for various 
reasons, but that's not really relevant to the argument).


> > What in practice would you expect AT to do with this knowledge? 
> > Remember that most abbreviations that aren't being tagged with 
> > expansions won't be marked up, so AT is going to have to deal sensibly 
> > with that case anyway.
> 
> So you'd prefer hit and miss heuristics over unambiguous interpretation?

Why are abbreviations any more important here than the many other cases of 
ambiguity that Smylers raised?


On Mon, 21 Apr 2008, Philip Taylor wrote:
>
> <abbr> does not allow unambiguous interpretation, since it will be 
> misused and abused, so heuristics are necessary to give a decent user 
> experience even with those elements. (Any other feature will also be 
> misused and abused, so that is unavoidable.)
> 
> http://philip.html5.org/data/abbr-acronym.txt shows some existing uses 
> of <abbr> and <acronym>:
> 
> <abbr> is used a lot for dates ('<abbr 
> title="2007-10-05T13:33:09-0400">Friday, October 5, 2007</abbr>'), so it 
> has to be treated as equivalent to non-marked-up text in at least those 
> cases.
> 
> <acronym> is used a lot for things that aren't acronyms ('<acronym 
> title="Forum Home">MUSICFANTALK</acronym>', '<acronym title="Greg Powell 
> and Mike Donovan are fictional characters [snip lots of text]">Powell 
> and Donovan</acronym>'), where people just want the styling and tooltip 
> effects, so it also has to be treated as equivalent to non-marked-up 
> text in those cases. (Currently HTML5 requires <acronym> to be replaced 
> with <abbr>, so these problems would apply to <abbr> in the future.)
> 
> The markup elements can be used as additional input information to guide 
> the heuristics, which may (or may not) be much better than without that 
> information, but there will always be some ambiguity that 
> implementations will have to cope with.

Indeed.


On Mon, 21 Apr 2008, Nicholas Shanks wrote:
>
> We need to go through this more methodically before making a decision. I 
> hope the following aids matters.

More methodically than

   http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-April/014470.html

...? I'm not sure exactly what you have in mind! :-)


> First, lets think about who will use abbreviations and why they need 
> them, second, think about where the information could come from.
> 
> Situations where expansions of abbreviations are needed:
> 1) People unfamiliar with the topic being discussed. This can happen if 
> you click a link to an anchor half-way down a page and miss the 
> introduction, or you are reading about a topic new to you. It should not 
> be required that the user screw around looking for the acronym with a 
> dotted underline. This would be terrible for users of non-visual UAs or 
> UAs that don't differentiate abbrs from normal text.

Abbreviations are no more special here than any term of art.


> 2) Documents that exist as both a single page, and as multiple pages 
> (like large web specifications). Should the expansion occur once per 
> file? That would require additionally marking up every abbr at it's 
> first occurrence on a page when splitting the single-page version.

Again, I don't see why this would be any different than any random term of 
art.


> 3) Documents that use the same acronym to mean different things in 
> different contexts/sections.
>
> For example, take that both <abbr title="United States of 
> America">USA</abbr> and <abbr title="United Space Alliance">USA</abbr> 
> previously occurred in the document, and you *don't* want, as an author, 
> for every future use of either term to be expanded by default (so will 
> not provide titles for all occurrences). I then jump into the middle of 
> a page from somewhere else and see "The USA's fleet of Space Shuttles 
> are refurbished by USA, LLC." and wonder what's going on!
>
> There's no way to tell which is which without heuristical analysis of 
> the language, so the UA can't auto-expand based on a single previous 
> occurrence, which I think is the behaviour you were expecting by 
> disallowing abbrs without titles and removing the referencing.

I didn't expect any autoexpading at all. Ever, even with <abbr> present 
with a title="" attribute. Why would one want that? That would be really 
annoying. We have acronyms and abbreviations for a reason -- to make 
things shorter! :-)


> 4) Documents where the acronym and an identically spelled word appear. 
> For example your current system would *require ambiguity* in the 
> admittedly somewhat unlikely newspaper headline:
>
> <h1><abbr title="British American Racing">BAR</abbr> RAISE THE BAR IN 
> FORMULA ONE<h1>
>
> Is the second BAR an acronym, which is prohibited from being marked up, 
> or not?No way to tell without heuristical analysis of the language. 
> Certainly not something most UAs will be doing, even for English. What 
> hope would Nahuatl have?
>
> At least with <abbr>BAR</abbr> you can tell that it *is* an 
> abbreviation, and can go look for the reference. Telling when a word 
> that's not marked up is or isn't an acronym (so deciding if the UA 
> should provide an expansion) is much harder.

It's quite obvious that the "BAR" in "RAISE THE BAR" is not an acronym.


> Ideally users need to have on-demand expansion of any abbreviation they 
> come across, in any situation, regardless of how competent the HTML 
> author was.

Sure. Similarly, on-demand definition of any term, searching the Web for 
that term, looking it up in the user's e-mail archives, etc. This is a 
mostly solved problem that doesn't require the UA to know anything about 
the document, really -- just the ability for the user to select text and 
apply commands to the selection.


> Erroneous expansion of non-abbreviations that match a previously defined 
> abbreviation is I think the hardest thing to avoid.

Why would the user request expansion of non-acronyms?


> Where should these expansions come from? The following fallback list seems
> reasonable:
> 1)	Inline with @title, the way it's currently done.
> 2)	By referencing, either automatically by the UA or explicitly marked
> up, an expanded occurrence of the acronym.
> 3)	Glossary file in <link> tag, which the UA can apply if unambiguous or
> could be referenced by markup. Not currently a feature of any UA.
> 4)	User's application- or system-wide lexicon file, containing terms in
> that user's field of occupation. On the Mac OS this is located under VoiceOver
> Utility→Speech→Pronunciation.
> 5)	Lexicon of the synthesiser, if one is being used.

Indeed.

> You are prohibiting (2) from being used, with the following consequences:

Why does probibiting <abbr>...</abbr> without title="" prevent UAs from 
searching previous <abbr> elements?


> a) Documents will either mark up every acronym with an <abbr title=…
> > tag—user agents that expand these by default (primarily aural as I
> understand it) will appear very verbose—or,

User agents that expand abbreviations by default are poor, IMHO.


> b) Documents will only mark up the first occurrence. UAs that do not 
> process subsequent occurrences of the abbreviation (currently all of 
> them), will suffer from lack of definitions.

I don't follow this. Why would documents only mark up the first one?


> c) In documents with the same abbreviation occurring for two different 
> expansions, UAs will have no means of disambiguating without linguistic 
> processing.

Why wouldn't the UA just provide both expansions?


> Using <a> to achieve referencing is a very bad idea, as it will pepper 
> documents with little blue underlined words and will and up far more 
> distracting than useful to users. Designers will also hate it, so it 
> will end up not being used at all.

That doesn't really seem to follow my experience with the Web. In 
particular, the HTML5 spec has links all over the place for 
cross-references, and it works great.


> Lastly, by disallowing the title attribute to be omitted you make things 
> unnecessarily difficult for currently valid HTML4 to migrate to valid 
> HTML5.

The idea is to help authors who forgot to annotate their abbreviations 
with expansions. By making omitting the expansion non-conforming, we catch 
all these cases.


On Mon, 21 Apr 2008, Philip Taylor wrote:
> 
> Out of 130K pages from dmoz.org, I see 592 using <abbr> elements, and
> 36 of those using it at least once with no title attribute. If anyone
> cares enough, they could look through the list to see how many are
> bogus and how many are expecting something useful and what they seem
> to be expecting.
> 
> Those 36 pages which used <abbr> with no title a couple of months ago:
> 
> http://bundesrecht.juris.de/gsgv_9
> http://linuxdidattica.org/
> http://markcronan.livejournal.com/33814.html
> http://observer.guardian.co.uk/politics/story/0,6903,449920,00.html
> http://outer-court.com/goodies/index.htm
> http://spazioinwind.libero.it/saf/
> http://tubewhore.livejournal.com/
> http://www.artofeurope.com/wong/
> http://www.beepworld.de/members10/princessa18/
> http://www.cs.tut.fi/~jkorpela/latinaohje.html
> http://www.danscamera.com/
> http://www.fwbosheffield.org/
> http://www.gnu.org/
> http://www.jokan.de/technik-c2.html
> http://www.mozilla.org/directory/
> http://www.mozilla.org/projects/mathml/
> http://www.offaly.ie/offalyhome/visitoffaly/Attractions/Family/bog+train.htm
> http://www.rekordbog.dk/
> http://www.seobythesea.com/
> http://www.travelphp.com/
> http://www.treseta.fi/
> http://www.voyager.prima.de/cpp/books1.html
> http://www.w3.org/TR/XMLHttpRequest/
> (plus 5 more on guardian.co.uk, and 8 more on beepworld.de)

Thanks for this.

At least one case is a clear error:

   <p><strong>Telecom Info</strong><br />
     <abbr><span class="abbr" title="Telephone">Phone</span></abbr> [...]
     <abbr><span class="abbr" title=""></span></abbr>

The others I looked at were all using <abbr> in a pointless way.


On Tue, 22 Apr 2008, Christoph Pper wrote:
> 
> I forgot so far to mention my dearest English abbreviation, actually it 
> is a (NIST-recommended) unit symbol and thus without the abbrev dot: 
> 'in' for inch. Unit symbols and abbreviated function names (e.g. 'sin') 
> also may need markup (and styling) to keep them upright inside italic 
> mathematic text (not every italic math is a |var|).

Interesting point.


On Mon, 21 Apr 2008, Simon Pieters wrote:
> On Mon, 21 Apr 2008 00:26:46 +0200, Ian Hickson <ian@hixie.ch> wrote:
> 
> > I've also made the title="" attribute on <abbr> required, [...]
> 
> > > There are legitimate reasons to not fill up the title attribute of 
> > > <abbr>. Or should <abbr> be disallowed in these situations?
> > 
> > I've disallowed it.
> 
> What's the point? There's no harm in titleless <abbr>s.

The idea was to help authors who meant to provide expansions, since now 
the validator will inform them when they forget.


> All you're achieving here is annoying authors who use titleless <abbr>, 
> maybe as a styling hook to achieve small-caps (e.g. 
> http://www.autisticcuckoo.net/archive.php?id=2007/06/13/samurai-attack 
> uses "<abbr>WCAG</abbr>").

That's a good point. Ok, I've allowed title="" to be omitted again.


On Tue, 22 Apr 2008, Christoph Pper wrote:
> > 
> > HTML5 had a complex mechanism for cross-references using <dfn>, 
> > <abbr>, <i>, and so forth. I've removed it. It really didn't add much 
> > compared to <a href=""> other than a whole lot of complexity, and 
> > there was very little demand for it really.
> 
> It was kinda cool, though.

Yeah.


> > Not sure which second sentence you mean, but for the record, you don't 
> > have to mark up all abbreviations. If they're known, don't bother.
> 
> If I want to reduce word-spacing for multi-dot abbreviations or change 
> font-size or font-variant for acronyms, I'll have to mark up all of 
> them. By the way, inside a run of text styled with "text-transform: 
> uppercase" acronyms should get dots, which is impossible with current 
> CSS, e.g. "IRAN THREATENS US" vs. "IRAN THREATENS U.S." from "<h1>Iran 
> threatens <abbr class="acro">US</abbr></h1>".

Fair enough. (Please let the CSSWG know about the styling issue, BTW.)


> > The point of <abbr> is to expand the acronym, not to just mark up what 
> > is an acryonym or abbreviation.
> 
> That's what you have made it. One could even argue whether the (sole) 
> point of providing a |title| for an |abbr| is expansion.

At the moment it is; I would recommend using <span title=""> around the 
abbreviation if you wish to annotate it further.

Cheers,
-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 23 April 2008 01:48:07 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:16 GMT