W3C home > Mailing lists > Public > public-html@w3.org > March 2015

Re: <wbr> rendering

From: Jukka K. Korpela <jukka.k.korpela@kolumbus.fi>
Date: Sun, 08 Mar 2015 16:42:36 +0200
Message-ID: <54FC5FDC.6010402@kolumbus.fi>
To: public-html@w3.org
2015-03-08, 12:38, Andrea Rendine wrote:
> Something like a couple years ago, I filed a bug on the WHATWG HTML 
> Living Standard project, complaining about how a tag was meant to be 
> rendered. I received a completely unsatisfying answer, which was 
> basically meant to tell me to wait. Enough waiting now. 

WHATWG and W3C are two different entities, though in cooperation. If you 
want to have WHATWG HTML changed, you need to talk to WHATWG, I think. I 
cannot comment on the content of your proposal, as you don’t quote or 
cite it.

> In the spec, <wbr> is meant to be a line break opportunity.
>

That’s a good informal description, but rather unsatisfactory as a 
definition. I think <wbr> should be defined so that its effect is 
equivalent to the presence of ZERO-WIDTH SPACE U+200B character, This 
would still require a description of the effect of this character; it 
should not mean just that a browser may break a line—rather the 
conditions for doing so should be specified as part of the layout 
algorithm. But this would be rather straighforward.

(In practice, support to <wbr> predates support to U+200B by many years. 
As a logical description, however, it is natural to base the definition 
of <wbr> on a standardized character designed for similar use.)

> The two examples provided are quite different (a prose with a curious 
> streamlike formatting, and a code fragment), but this underlines how 
> different use cases can be.
>
Both examples are meaningful, but rather special. Somewhat more normal 
examples would be

input/<wbr>output
choo-<wbr>choo


> Now, when reading the prose I thought to myself that this can be a 
> convenient way to insert hyphenated word breaks, as most languages 
> break words with a - mark.
>

Such a convention is far less universal than you might think. In any 
case, hyphenation is very different from conditional line break. When 
Netscape invented <wbr>, they chose a very poor name, since <wbr> isn’t 
really Word BReak except when used for text in a language that does not 
use hyphens to indicate line breaks. It’s more like allowed break in a 
string.

> <wbr> would remove the need for a &shy; soft hyphen.
>

String break opportunities and word hyphenation are very different 
things. The SOFT HYPHEN U+00AD character is meant to be used to 
interfere with automatic hyphenation (usually to deal with cases that it 
would hyphenate incorrectly or would fail to hyphenate at all) or, in 
some cases, to specify hyphenation points in a situation where automatic 
hyphenation cannot be used.
>
> An alternative to good ol' U+173 Soft Hyphen is needed because such 
> character is copypasted along with the text itself, and can be fetched 
> by data-mining tools, while it isn't strictly part of content. <wbr>, 
> on the other side, is markup and wouldn't appear where only text is to 
> be fetched.
>

The soft hyphen has been with us for many years, about 20 years in HTML 
I would say, though support to it has been surprisingly limited. Any 
software that reads HTML data needs to cope with it. Copypaste has many 
similar problems already. Or, rather, it is not a problem with copypaste 
but with using the pasted text in a particular environment. It is all OK 
for plain text to contain soft hyphens. If you paste text in an 
environment where e.g. soft hyphens or other invisible characters might 
cause problems, you just need to deal with it. Even if soft hyphens were 
strictly forbidden now, you would still have zillions of them in 
existing data.

> Now, according to the idea of the spec, an author could style such 
> element so that its rendering fits to the purpose. For instance, <wbr> 
> inside <code> could be rendered as zero width space, while <wbr> in 
> paragraphs could become a soft hyphen.
>

I cannot see where you got that idea, but in any case, <wbr> is not 
meant to become a soft hyphen.

> Now, in the last 2 years i saw no improvements in the implementation, 
> by UAs, of the "content" property applied to real (non pseudo) elements.
>
There are no specifications on such things either, and there has not 
been work on the issue for many years, as far as I can see.


> What is important now is that <wbr> doesn't work on its own. 

It does its own job well.


> It breaks words, right, 

No, it just declares a permitted line breaking point in a string.

> but any empty span element would do the trick. 

No, empty spans do not do such things; foo<span></span>bar acts as 
foobar as far as layout is considered,

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/
Received on Sunday, 8 March 2015 14:43:06 UTC

This archive was generated by hypermail 2.4.0 : Saturday, 9 October 2021 18:46:12 UTC