[whatwg] Phrasing semantics feedback omnibus from Ian Hickson on 2008-12-17 (public-whatwg-archive@w3.org from December 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 17 Dec 2008 22:02:57 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0812170248580.30225@hixie.dreamhostps.com>
On Fri, 19 Sep 2008, Ozob the Great wrote:
>
> I am concerned about the existence of HTML5's <var>. This was brought to 
> my attention during a technical debate on Wikipedia which amounted to: 
> Where is use of <var> appropriate? The problem is that while <var> can 
> be used to distinguish variables from non-variables, there are many 
> other mathematical constructs which cannot properly be called variables. 
> If variables are going to be distinguished by the markup, then these 
> other constructs ought be distinguished by the markup. But they can't be 
> put inside <var> because they're not variables, and furthermore, they 
> can't always said to be constants, functions, spaces or any short list 
> of allowable objects; the number of different types of objects occurring 
> in mathematics is tremendous, and specifying all the allowable objects 
> in HTML markup is undesirable.
>
> [...]
> 
> Content MathML gets around this by defining approximately 120 different 
> content elements. HTML 5 neither needs nor wants 120 different 
> mathematical content elements. The only solution I can see to this is to 
> deprecate <var>: Authors who wish to provide content markup should use 
> MathML, because it is designed for such things.

I agree that MathML is how you should mark up this kind of thing. However, 
<var> still has other uses, e.g. the spec uses it to good effect in a 
number of places. It is not just for mathematics. There is an example in 
the <var> element's section to this effect.
 

On Fri, 19 Sep 2008, Tab Atkins Jr. wrote:
> 
> <var> *can* be used to mark up variables in a mathematical expression.  
> It's primary use, though, is to mark up variables in things like, say, 
> computer code, because these are often styled differently than the rest 
> of text. Frex, a <code> block may be generally just white-space:pre, but 
> the vars will be bold as well.
> 
> In simple math expressions (the kind that can be expressed in vanilla 
> html5), there is usually also a special convention for marking up 
> variables.  Oftentimes they are simply italicized.
> 
> If you are wanting to mark up complex mathematical text with explicit 
> semantics, one should indeed use MathML.  <var> is not meant to replace 
> it; it's meant to provide a simple bit of semantics for a relatively 
> common use-case.

Right.


On Sat, 20 Sep 2008, Ozob the Great wrote:
> Benjamin Hawkes-Lewis wrote:
> >
> > There's already ongoing work to allow MathML and SVG vocabularies to 
> > be expressed in the text/html serialization of HTML5.
> 
> Then <var> steps on MathML's toes: It duplicates functionality.

On Sat, 20 Sep 2008, Edward Z. Yang wrote:
> 
> Not necessarily; a program variable should certainly not be marked up 
> with MathML.

On Sat, 20 Sep 2008, Ozob the Great wrote:
> 
> Conceded. I believe that in a mathematical context I am still correct.

Right, in a mathematics context, e.g. in an expression, if one needs to 
mark up anything beyond just what the variables are, then one should use 
MathML. For anything simpler, or for non-mathematical contexts, <var> is 
still useful.


On Sun, 21 Sep 2008, Henri Sivonen wrote:
> 
> The use cases for <var> probably aren't strong enough to warrant its 
> addition to HTML at this stage if it hadn't been in HTML already--you 
> might as well use <i>. However, given that <var> has already been in 
> HTML for a long time, it probably isn't harmful enough to make it 
> non-conforming. Actually, its main harm is the opportunity cost of the 
> debates about when it's appropriate to use it. :-/

Pretty much! :-)


On Sun, 21 Sep 2008, Ozob the Great wrote:
> 
> Let me make a specific and concrete proposal. 4.6.14 should be changed 
> to read as follows:
> 
> The var element represents a variable. This could be an actual variable 
> in a programming context, or it could be a term used as a placeholder in 
> prose. Use of var in a mathematical context is deprecated in favor of 
> MathML content markup.
> 
> The example following would be left the same.

I disagree with this, because it still makes sense to use <var> for 
variables even in simple mathematical contexts (e.g. the spec does so in 
several places without needing to drop down to MathML).

However, I agree that we should mention MathML there, so I have done so.


On Wed, 8 Oct 2008, Nils Dagsson Moskopp wrote:
> 
> I was wondering what markup one could use for tag clouds and similar 
> ways to convey different importance of marked up content. Current 
> popular markup involves nested <em> elements, which strikes me as kinda 
> complicated and, in extreme cases hard to read.
> 
> I was thinking along the lines of a @weight attribute whose value would 
> range from 0 (least importance) to 1 (maximum importance) due to 
> normalisation (scaling according to power law). This could also used in 
> normal text to confer the amount of stress put into phrases - say, for a 
> voice client.
> 
> I'm aware that this is probably a stupid idea, but, as I said, nested 
> <em>s (and custom class names even more so) strike me as particularly 
> difficult.

On Wed, 8 Oct 2008, Tab Atkins Jr. wrote:
> 
> It wouldn't auto-style, but you could use a @data-tag-weight attribute. 
> Javascript can then come around and set sizes explicitly.
> 
> The problem with the proposal as it is is that it ignores the very real 
> difficulty in figuring out just *how* to scale the tag sizes based on 
> the weight.  Even with this weight attribute, you'd need a chunk of 
> newly-crafted CSS to control that decently as well.
> 
> On the other hand, a simple javascript library could very easily handle 
> the styling for you.  It would be a fun weekend project to get something 
> decent running.

This is an area that has had some research done:

   http://24ways.org/2006/marking-up-a-tag-cloud
   http://microformats.org/wiki/tagcloud-brainstorming

I don't really see a compelling answer yet, and frankly tag clouds as a 
whole aren't really so important that we need dedicated markup. So I 
haven't put anything in the spec about it yet.

Generally I would recommend something like the 24ways proposal:

   <ol class="tag-cloud"> <!-- alphabetical -->
    <li class="tag-cloud-4"><span>4 times</span> <a href="/t/bar">bar</a>
    <li class="tag-cloud-2"><span>2 times</span> <a href="/t/baz">baz</a>
    <li class="tag-cloud-5"><span>5 times</span> <a href="/t/foo">foo</a>
   </ol>

...with:

   @media screen, print, handheld, tv { 
     /* should be ignored by speech browsers */
     .tag-cloud > li > span { display: none; }
     .tag-cloud > li { display: inline; }
     .tag-cloud-1 { font-size: 0.7em; }
     .tag-cloud-2 { font-size: 0.9em; }
     .tag-cloud-3 { font-size: 1.1em; }
     .tag-cloud-4 { font-size: 1.3em; }
     .tag-cloud-5 { font-size: 1.5em; }
   }

Or, alternatively:

   <ol class="tag-cloud"> <!-- alphabetical -->
    <li><a title="4 times" href="/t/bar">bar</a>
    <li><a title="2 times" href="/t/baz">baz</a>
    <li><strong><a title="5 times" href="/t/foo">foo</a></strong>
   </ol>

...with:

   @media screen, print, handheld, tv { 
     /* should be ignored by speech browsers */
     .tag-cloud > li { display: inline; }
     .tag-cloud [title^="1 "] { font-size: 0.7em; }
     .tag-cloud [title^="2 "] { font-size: 0.9em; }
     .tag-cloud [title^="3 "] { font-size: 1.1em; }
     .tag-cloud [title^="4 "] { font-size: 1.3em; }
     .tag-cloud [title^="5 "] { font-size: 1.5em; }
   }

Or some combination thereof.

(The CSS would actually need to be more hacky until such time as screen 
readers grow up to be real speech browsers instead of following the screen 
media, but that's another problem.)

If people think all this is worth mentioning in the spec, I can add it. 
Let me know.


On Wed, 5 Nov 2008, Pentasis wrote:
> 
> What the spec currently does is:
> 1) *exactly* defining some elements
> 2) Giving examples for *some* constructions whcih are not defined exactly
> 3) not talk about other things.
> 
> Like I have said before, the problem is that nobody can think up all 
> possible type of semantic/content/context in existance (let alone those 
> who are thought up).

Sure... but why is this a problem?


> The only solution would be by creating a type of classification methode. 
> Partly this is allready there (block, inline, text, structure, etc.). 
> But this should be done one more level down.
> 
> For example:
> 
> abbr, dfn, cite are all the same "type" of "word". Why not remove them 
> and replace them with something like <reference> (just an example!).
>
> Then use the class attribute to define the actual role 
> (class="abbreviation" etc.). In other words, implement the microformats 
> on a much wider base as part of the standard.
>
> These classifications can then be requested and implemented outside of 
> the spec in an open forum. A process which should be fast and more 
> structurised than it is currently.

I don't understand what problem this solves.


> The same applies for em and strong. Not sure hwo these should eb 
> classified, but I think they are used to change a word or group of word 
> in context. So perhaps a <contxt> tag? sup and sub would also fall in 
> this catagory (technically speaking sup and sub are style elements).
>
> pre is also a style element. Why not simply use <p> and style it 
> preformatted if needed depending on the role it has?

What problem does this solve?


> <var> is the best example I think. Why <var> but not <function> 
> <operator> <operand> etc. etc. etc.?

We have all of those too, they're in MathML, which is now part of the 
text/html language.


> And if code gets this attention why not language? (<verb>, <noun> etc. 
> etc.) If we do it like that it would never work.

Well, it's not clear that there is much need for marking up verbs and 
nouns, etc. People discussing grammar are a niche case that is handled by 
<span> pretty well.


On Wed, 5 Nov 2008, Pentasis wrote:
> 
> Strictly speaking, does it matter for the DOM or parser or whatever, if a tag
> is named and used like: <abbr title="description">someword</abbr> or like
> this:
> <reference class="abbreviation" title="description">someword</reference>.
> I don't see how that would make things technically different?

The former is simpler for authors, and, as an added bonus, already works 
today. Those are the major differences.

Also, why stop at <reference class="abbreviation">? Why not go one step 
further? For example, we could have:

   <phrasing class="reference" subclass="abbreviation">...

...or even:

   <text class="phrasing" subclass="reference" subsubclass="abbreviation">...

It isn't clear to me where we should stop. To determine where we should 
stop, we need a clear description of the problem we are trying to address; 
I don't really understand the problem here.


> Another example (just a thought, don't take it seriously) What if we 
> eliminate headers alltogether and specify that the title attribute of a 
> section is the header.

Again, I don't understand _why_ we would do this. What problem would this 
solve? Authors today seem to understand and use headers reasonably well, 
all things considered.


On Sat, 8 Nov 2008, Eduard Pascual wrote:
>
> And finally, on the assistive technologies' and search engines' side, 
> this kind of elements would allow to describe the contents of webpages 
> far better, which would be a clear benefit.

How so? Why and how would this help ATs and search engines? I haven't 
heard any requests from either ATs or search engines asking for this kind 
of feature.


On Thu, 6 Nov 2008, Eduard Pascual wrote:
> 
> [...] is the backwards compatibility topic being dealt appropriately? 
> For example, why keep <var> (and others), but drop <big>? Why don't keep 
> <font> as well? It is part of the HTML legacy, after all, and a quite 
> large part if you look at the markup of currently existing documents 

Each was considered. <var>, as noted above, is useful for denoting 
placeholders or talking about variables. <big> seems to be redundant with 
more precise elements like <h1> and <strong>. <font> is not as powerful as 
CSS and is a considerable source of accessibility problems.


> (I'd bet that it's among the three most used elements in the current 
> web, sharing the podium with <p> and <a>, but can't say for sure).

<p> and <font> aren't even in the top 10. :-)

   http://code.google.com/webstats/2005-12/elements.html


> I think following HTML4's and XHTML1's approach and having
> Transitional and Strict flavors wouldn't be a bad idea

What problem does it solve? It was originally intended to help people 
transition from HTML3.2 to HTML4 Strict, but it ended up not being 
successful at this.


> Initially, HTML was entirely structural: no presentation, and no 
> semantics. Just paragraphs, headings, anchors, and few other things.

This isn't actually the case:

   http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Tags.html

Notice <hp1>. :-) And <b> and <i> were introduced at the same time as 
<strong> and <em>, with HTML2.


> With HTML3.2, there was an atempt to make HTML presentational, and it
> soundly failed.

I don't know if I'd say that it "failed". It was a disaster, for sure, but 
mostly because of its success, not its failure.


> It was aknowledged as a mistake, and HTML4 (plus CSS) put a good deal of 
> work on fixing it: presentational stuff went out (more preciselly, 
> "deprecated"), and presentation was delegated to a separate language 
> (CSS). HTML only left @class for hooking to external information, and 
> @style for when embedding was more appropriate.

That's not quite right (e.g. HTML4 has <table border> even in strict), but 
mostly, yes. HTML5 continues in this direction, going even further (e.g. 
we don't have the presentational attributes on tables, and <i> and <b> are 
defined in logical terms, not only based on visual presentation).


> Then, to make sure noone was left out, a Strict flavor of the language 
> was published, keeping it "pure", and a Transitional one, keeping all 
> the deprecated stuff on it to ease transition, and to enable 
> document-level backwards compatibility. I hope we all agree this was a 
> good solution and that it worked; but if somebody doesn't, please let me 
> know.

I'm not sure exactly how you define "worked" here. Most pages don't use 
HTML4 Strict, they use HTML4 Transitional or have no DOCTYPE at all. 
Presentational idioms are very widely used on the Web.


> So, if it worked, why not reuse that approach? Why do we need to go 
> through the same mistakes again?

We are in fact using an even more aggressive form of the approach you 
describe.


> Now, Pentasis initial posts were showing up a fact: sematic markup 
> doesn't do enough to properly describe the semantics of webpages.

It's not clear to me that this has been convincingly shown.


> Can't we simply apply an equivalent solution to the one we used for an 
> equivalent problem ten years ago?

It's not clear to me that the problems are equivalent, since I don't 
understand the current problem.


> <nav> is the only facility in the spec right now to describe 
> "navigation" semantics; but it also implies a "section" structure: hence 
> there is no means to express "navigation" semantics for something that 
> isn't structurally a "section" (for example, headings of the recent 
> changes to a site in the site's main page, linked to the relevant 
> sections, are quite "navigation" stuff, but they are definitely not 
> sections).

Well, all links are "navigation", that's shown by the <a> element. The 
<nav> element is specifically for navigation within a set of pages. I 
don't see this as a big problem (or really, a problem at all).


> Similarly, there is no way to mark something as "tangentially related" 
> without making it a "section" (with the <aside> element).

So?


> And, for example, what about something that's both "navigation" and 
> "tangentially related" (regardless of wether it is a section or not)? 
> For example, a list of "see also" stuff on a documentation page: you 
> would be forced to markup it as <<a "navigation" section inside an 
> "aside" section>> or as <<an "aside" section inside a "navigation" 
> section>>: none of both reflects the real structure of the page; but 
> they are the only ways to represent both semantics.

Actually, I wouldn't use either. "See also: foo, bar" is not necessarily 
an aside and not necessarily site navigation in the sense of <nav>, it can 
be part of the main body text.

One way to think of <nav> is "would you want an accessibility tool to skip 
these links by default?". One way to think of <aside> is "would you want 
this to be moved to a sidebar?".


> Now, to something more specific, we'd need:
> 1) Some (external to HTML) way to describe semantics. (And no, I don't 
> think RDF, on its current form, is a solution for this; but maybe the 
> solution could be based on or inspired by RDF.) That should be to 
> semantics what CSS is to presentation. And we don't really need to care 
> about browsers quickly implementing it, or about legacy browsers that 
> don't implement it, because currently browsers don't care at all about 
> semantics (at least, not beyond displaying @title values and for default 
> rendering, and rendering can be dealt with through CSS anyway).

Isn't the HTML5 spec itself this?


> 2) A way to hook these external semantics to arbitrary elements of a 
> page: we already got @class for this :D

This seems like a really bad idea -- why would we want to have elements 
that can change meaning on the fly?


> 3) A way to add inline semantics when needed. I guess a "semantics" 
> attribute would be the most straight-forward approach. About the format 
> it uses, we should care about it once we have solved 1).

We already have that, e.g. <em> is an inline way of saying "emphasis". 
Moving it to an attribute is just syntactic juggling.


> If we got that, then we could:
>
> 1) Get rid of all the "wannabe semantic" elements that didn't really 
> work well enough, sending them to the 
> deprecated/transitional/supported-for-backwards-compatibility-only 
> limbo.

What elements are these? I thought we'd already obsoleted the useless 
ones.


> 2) Get rid of all the *new* "wannabe semantic" elements that wouldn't be 
> really serving any purpose (ie: un-bloat the content model)

Why is this a goal?


> 3) Have the simplest and cleanest markup, the most accurate presentation 
> mechanisms, and the richest semantic descriptions of the last 10 (or 
> even more) years, all in one package.

I disagree that what you describe would be simpler or cleaner than what we 
have now. I don't understand how you would get anything more "accurate" or 
"rich" than now in a way that regular authors could understand and use.


> Anyway, I think its also worth pointing out the issue with headings: 
> currently, the spec recommends using <h1> for all levels of headings, 
> but that would mess the hell up on current browsers. Hasn't anybody 
> noticed that?

That's why we also allow using the other <hx> headers.


On Wed, 5 Nov 2008, Pentasis wrote:

> [...] the current spec (and the old ones before it) limit, even imprison 
> linguistic, semantic and typografic evolution on the web by defining 
> strict elements/tags or by providing no element/tag at all. Thus 
> creating boundries that create an inflexible environment for these 
> implementations. HTML should provide for an open base in which these 
> things can evolve freely and naturaly.

Why? What problem does the current "imprisoning" cause? Can you give an 
example of how a user is hurt by the current design?


On Fri, 14 Nov 2008, Nils Dagsson Moskopp wrote:
> >
> > The small element represents small print [...]
> >
> > The b element represents a span of text to be stylistically offset 
> > from the normal prose without conveying any extra importance [...]
> 
> Both definitions seems rather presentational (contrasting, for example, 
> the new semantic definition for the <i> element) and could also be 
> realized by use of <span> elements.

Consider a speech browser. Does it makes sense to convey small print in a 
speech context? (Yes, consider radio ads for pharmaceuticals. They speak 
faster for the small print.) Does it make sense to represent a span of 
text stylistically offset from the normal prose without conveying 
importance in a speech browser? (Yes, e.g. there could be a "bing" sound 
after each word in a <b>, indicating that it is a keyword. I can't think 
of an example on radio currently, though.)

Media independence is what we're going for here. <font>, for example, 
isn't media-independent.


On Mon, 24 Nov 2008, Lachlan Hunt wrote:
>
> I have added an entry to the FAQ detailing the rationale for including 
> these elements, and have previously written an article about the issue 
> too.
> 
> http://wiki.whatwg.org/wiki/FAQ#Why_are_some_presentational_elements_like_.3Cb.3E.2C_.3Ci.3E_and_.3Csmall.3E_still_included.3F
> http://lachy.id.au/log/2007/05/b-and-i
> http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2007-January/009060.html

Thanks!


On Fri, 14 Nov 2008, Pentasis wrote:
> 
> 1) Just because it makes sense to a human (it doesn't to me), does not 
> mean it makes sense to a machine.

HTML is ultimately meant for human consumption, not machine consumption. 
Humans write it (sometimes with the help of a machine), humans read it 
(almost always with the help of a machine). We don't need it to make sense 
to a machine, we just need the machine to do what we tell it to so that it 
makes sense to us.


> 2) When using <small> on different text-nodes throughout the document, one
> would expect all these text-nodes to be semantically the same. But they are
> not (unless all of them are copyright notices).

How is this different to, say, <strong>?


> 3) <small> is a styling element, it has zero semantic meaning, so it does not
> belong inside HTML.

In HTML5 it has semantic meaning, based on how these elements were used 
in practice.


> 4) <b>Siemens</b> also does not tell me anything about the semantics. Is 
> it used as a name, a brand a foreign word ? etc. I cannot get that 
> information from looking at the <b> element.

So? Why do you need that information?


On Fri, 14 Nov 2008, Pentasis wrote:
> 
> First: Computers are binary instruments. conveying *something* is not 
> very logical seen from a computers point of view. It is not usefull to 
> *me* to provide a class to the <i> or any other element, it is usefull 
> to the computer, as humans may indeed come to some sort of conclusion 
> based on style or strangely used semantics, computers cannot, they 
> (still) need a more literal means of semantics.

I don't understand.


> Second: Suppose I want to collect all copyright notices from 1000 
> websites (don't ask me why, I just want to), how am I to do this when 
> they are marked up in <small>s? I will definatly end up with a lot of 
> text that has nothing to do with copyrights (and probably miss a lot of 
> copyright notices as they are marked up differently) Whereas If they 
> were maked up in (for example) <span class="copyright"> I could retrieve 
> it all based on the class-name.

If you want to pick up copyright notices, look for the word "copyright" or 
the (c) character. Why would you use <small>?


On Fri, 14 Nov 2008, Pentasis wrote:
> 
> Not yet maybe, but we could at least try to keep options open for the 
> future.

This doesn't scale -- there are an unbounded set of features that aren't 
in HTML5 currently. We can't add them all. We are focusing on only adding 
those features that we can justify today, as that seems like the most 
sensible cut-off point given that we need a cut-off point.


> But another example (based on "siemens") wouldn't it be nice if I could 
> tell Google I am looking for a person named "Siemens" so it would ignore 
> the "brand"-name?

That would only work if authors reliably marked up author names vs brand 
names. Given the difficulty we have getting authors to reliably mark up 
headers vs paragraphs, or tables headers vs table cells, I really don't 
see any reason to be optimistic about authors' abilities to mark brand 
names separate from people names.

Also, Google already handles the case you mention. Search for "Siemens". 
Then search for "George Siemens" or "Fred Siemens".


On Mon, 17 Nov 2008, Smylers wrote:
> 
> That works fine with <small>.  User-agents which can't literally render 
> smaller fonts can choose alternative mechanisms for denoting lower 
> importance to users.

Note that <small> doesn't indicate lack of importance (indeed if anything 
it is often _more_ important); it only indicates that the author doesn't 
really want the reader to read it (it's "small print"). Sideeffectsmayincl
udevomittingordeathalwaysconsultadoctorbeforetakingourproduct.


On Mon, 24 Nov 2008, Asbj?rn Ulsberg wrote:
> >
> > That works fine with <small>.
> 
> No, it doesn't, and you explain why yourself here:
> 
> > User-agents which can't literally render smaller fonts can choose 
> > alternative mechanisms for denoting lower importance to users.
> 
> If the point isn't to literally render smaller fonts, you shouldn't 
> indicate that you want the fonts rendered smaller either. What you want 
> is to semantically indicate that the text wrapped inside the element is 
> of less significance than the surrounding text, e.g. a negative 'strong' 
> or 'em'. Just as 'b' isn't equal to 'strong', 'small' isn't equal to 
> what we're trying to express here.
> 
> What we need is a new element that can capture this semantic.

Why? <small> already works, and is already widely used for this semantic.


> > However, you can only notice this if the words have been distinguished 
> > in some way.  With <b>, all user-agents can choose to convey to users 
> > that those words are special.
> 
> They are only special for sighted users, browsing the page with a rather 
> advanced user agent. They are not special to blind users or to users of 
> text-based user agents like Lynx. If you want to express semantics, then 
> use a semantic element.

<b> now _is_ a semantic element. Lynx already uses a different colour for 
it, for example. What problem do we solve by inventing a new element to do 
exactly what <b> does today?


> Expressing semantics through presentation only is done in print because 
> of the limitations in the printing system. If the print was for a blind 
> person, printed with braille, one could imagine (had it been supported) 
> that letters with a higher weight could be physically warmer than 
> others, or with a more jagged edge so they could stand out.

Right, and we can get that with <b>. No need for a new element.


On Mon, 24 Nov 2008, Asbj?rn Ulsberg wrote:
> On Mon, 24 Nov 2008 17:19:44 +0100, Smylers <Smylers at stripey.com> wrote:
> > 
> > I don't see how that explains why <small> is an inappropriate tag to 
> > use for things which an author wishes to be less noticeable.
> 
> I was thinking mostly about the tag's current usage on the web, which is 
> a crazy mix between the HTML4 and HTML5 definition of the element. HTML4 
> defines it purely presentational, HTML5 mostly semantical. In that 
> context, I believe <small> is inappropriate.

The same could be said for <p>, <table>, <blockquote>, <ol>, etc. People 
abuse HTML elements widely.


On Tue, 25 Nov 2008, Calogero Alex Baldacchino wrote:
>
> Of course that's possible, but, as you noticed too, only by redefining 
> the <small> semantics, and is not a best choice per se. That's both 
> because the original semantics for the <small> tag was targeted to 
> styling and nothing else (the html 4 document type definitions declared 
> it as a member of the fontstyle entity, while, for instance, <strong> 
> and <em> were parts of the phrase entity), and because the term 'small', 
> at first glance, suggests the idea of a typographical function, 
> regardless any other related concept which might be specific for the 
> English (or whatever else) culture, but might not be as well immediate 
> for non-English developers all around the world. As a consequence, since 
> any average developer could just rely on the old semantics, being he 
> intuitively confident with it, the semantics redefinition could find a 
> first counter-indication: let's think on a word written with alternate 
> <b> and <small> letters, or just to a paragraph first letter evidenced 
> by a <b>, obviously the application of the new semantics here would be 
> untrivial (i.e. an assistive software for blind users would be fouled by 
> this and give unpredictable results). Despite the previous use case 
> would be a misuse of the <b> and <small> markup, yet it would be 
> possible, meaning not prohibited, and so creating a new element with a 
> proper semantic could be a better choice.

Could you give a concrete example? In all the examples I can think of, 
there is no problem that I can see. For example this:

   <p><b>H</b>ello!</p>

...would be fine in an AT, even if the AT went "bing" as it was saying the 
first part of the word.


> However, I think that a solution, at least partial, can be found for the 
> rendering concern (and I'd push for this being done anyway, since there 
> are several new elements defined for HTML 5).

Which rendering concern?


> Most user agents are capable to interpret a dtd to some extent

Actually other than the validator, user agents ignore the DTD altogether.


> Let's come to the non-typographical interpretation a today u.a. may be 
> capable of, as in your example about lynx. This can be a very good 
> reason to deem <small> a very good choice. But, are we sure that *every* 
> existing user agent can do that? If the answer is yes, we can stop here: 
> <small> is a perfect choise. Better: <small> is all we need, so let's 
> stop bothering each other about this matter. But if the answer is no, we 
> have to face a number of user agents needing an update to understand the 
> new semantics for the <small> tag, and so, if the new semantics can be 
> assumed as *surely* reliable only with new/updated u.a.'s (that is, with 
> those ones fully compatible with html 5 specifications), that's somehow 
> like to be starting from scratch, and consequently there is space for a 
> new, more appropriate element.

All browsers handling <small> is better than some browsers handling 
<small>, certainly, but some browsers handling <small> is better than no 
browsers handling a new element. So I don't really agree with your 
reasoning here.


> Apart from considering that <b> isn't a good choice in such a case 
> (<strong> or <em> are far better, since they were born with the proper 
> semantics), [...]

Neither <strong> nor <em> is appropriate for marking up a brand name, 
IMHO.


On Tue, 25 Nov 2008, Calogero Alex Baldacchino wrote:
> 
> I'll start with an example. A few time ago I played around with Opera 
> Voice. It seemed to be capable to interpret visual style sheets and 
> specifically font styles, so that bold or italics text (so constraint in 
> the style sheet, not the markup) were spoken differently from 'normal' 
> text, but a paragraph first letter differing from the rest of the word 
> (which is a non-rare typographical choice), as far as I remember, caused 
> the whole word to be skipped. This suggests me that if we really want a 
> 'cross-presentation' semantics, we have to keep as far as we can from 
> anything having a *main* typographical semantics (as <small> and <b> 
> have from their birth).

I don't think this browser bug is a good guide for language design.


> I think that very likely both <b> and <small> will carry on their old 
> semantics, so being more prone to misuse with respect to their new one, 
> since very likely a lot of developers are, and will rest, more confident 
> with their original semantics, which is also suggested by their names 
> ('b' standing for 'bold' and 'small'... for something small on the 
> screen or on paper). Instead, a new element would require the developer 
> to take some effort at least to learn about its existence, so he would 
> read that such element primary use is to indicate a different importance 
> of a piece of text, so that a non visual user agent can present it in an 
> appropriate manner, and a visual or print user agent can render it in 
> different ways.

The way authors use <b> and <small> now is pretty close to what we want 
anyway, so I don't think it's a huge problem.


> Ah, the default style could be slightly or very different from the 
> <small> one, i.e. the text could be surrounded by parenthesis or 
> hyphens, despite of the font size (and the new elements could be 
> designed such to accept just non-empty strings consisting of more than 
> one non-spacing character).

That doesn't seem helpful... the current rendering of <small> seems like 
exactly what we want.


> Let me reverse this approach: what should an assistive user agent do 
> with such a <b>M</b><small>E</small><b>S</b><small>S</small>? I think 
> that dealing with that word as normal text would be a more gracefull 
> degradation than discarding it, and if we clearly state that <b> and 
> <small> have only typographical semantics, while different elements are 
> provided to differentiate the grade of emphasys of a phrase, an 
> assistive user agent could support a better behaviour, while any author 
> disregarding semantics would not cause any trouble (the <b> and <small> 
> wrapped alternating characters example may be unrealistic, but a 
> paragraph could actually start with a bold and bigger first letter using 
> <b> and <font> instead of style sheets).

What should an AT do with <em>M</em><strong>e</strong>s<em>s</em>? Why is 
this any different?


> I know, and agree with the basic reasons; however I think that deriving 
> an SGML version (i.e. by adding new entities and elements, as needed, to 
> an html 4 dtd) should not be very difficoult, and could be worth the 
> effort (i.e. to graceful degrade the presentation of a menu element 
> thought as a context menu, wich content should not be shown untill a 
> right click happens - if the u.a. cannot handle it, not showing it at 
> all could be a reasonable behaviour).

Browsers have shown great disdain to SGML, and I see no reason to expect 
them to change.


> The derived sgml version should be aimed just for older browsers [...]

Older browsers don't support SGML either.


> Here it is me not understanding. I think that any reason to offset some 
> text from the surrounding one can be reduced to the different grade of 
> 'importance' the author gives it, in the same meaning as Smylers used in 
> his mails (that is, not the importance of the content, but the relevance 
> it gets as attention focus - he made the example of the English "small 
> print" idiom, and in another mail clarified that "It's less important in 
> the sense that it isn't the point of what the author wants users to have 
> conveyed to them; it's less important to the message.

I strongly disagree, and urge you to compare the examples in the spec for 
<em>, <strong>, <b>, <i>, and <small>, which show very different cases. 
They are not equivalent. Only <strong> indicates a change in importance.


On Sun, 30 Nov 2008, Calogero Alex Baldacchino wrote:
> 
> [...] an <activators> element [...]

I encourage you to look at the <command> element in HTML5. I'm waiting for 
implementations of that before looking at access keys.


On Wed, 26 Nov 2008, Calogero Alex Baldacchino wrote:
>
> Now I'll throw in an even creazier idea. Let's maintain everything as 
> is, and let's add two new elements the semantics of 'outstanding' and 
> 'aside', which works as meta informations, i.e. they have no default 
> style (or a default such as 'display:inline' and all the rest, but 
> aureal properties, is inherited from the parent element) and ignore any 
> style direclty set on them, but aureal styles, so they are ignored by 
> any author not caring of them, but act as a shortcut for basic aureal 
> behaviour for authors caring insteed, but not willing to create a whole 
> aureal sheet, and can be helpful for an assistive software, regardless 
> its support for aureal sheets, since their basic semantics is used as a 
> hint on what to do despite any visual styling of the inner content (and 
> of any inner <small>, <b>, <strong>, and so on).

Experience with aural-specific markup has been quite negative, in that 
people end up using it when they think it's appropriate but it is not, and 
they end up making the experience significantly worse for screen reader 
users. Media-specific markup is bad regardless of the medium, it seems.


On Sun, 30 Nov 2008, Calogero Alex Baldacchino wrote:
> 
> Ok, let's define 'special' in a more correct manner. What should be a slight
> offset? What does 'outstanding for some reason' mean, in a less ambigous
> definition? How should the offset be interpreted by the user agent? Is there
> any valid and exact metering for the offset? Since, by default, <b> and <i>
> have the same typographic semantisc as <strong> and <em>, but we're giving
> them a different semantics in general, let's consider them from a non
> graphical point of view, i.e. that of a screenreader, or a speech engine in
> general, and let's disregard the state of the art for such softwares, since
> I'm going into a somewhat formal question outstanding from the mere aural
> context. How should our 'non-human reader' interpret a <b>/<i> element? Is it
> ok to tell it can be dealt with the same way as a <strong>/<em> element? Well,
> if they're the same thing (but we wish they're not), perhaps we don't need
> redundancy here. Is it fine to tell the <b>/<i> semantics can be compared to
> that of <strong>/<em> on a scale where the <b>/<i> element occupies an
> intermediate level? Ok, let's point this out so that, by default, the reading
> bot can meaningfully set its parameters at the mean point between spoken plain
> text and spoken emphasized/important text. Is it better to state they can be
> set at different levels on that scale for different purposes? Ok, let's define
> when it occurs and why and what's the exact level corresponding to every and
> each case (that is, let's add a specific attribute with a set of predefined
> values, but perhaps such would be accomplished better through aural
> properties, and anyway would lead to an untrivial linguistic analysis). Is it
> better to say they're just different? Ok, but can we define what such should
> mean? what they are exactly? Does 'different' mean they're sometimes close
> together with <strong>/<em>, other times they're similar but closer to plain
> text, or at an intermediate level, while some other time they're something
> just different, non-important, non-enphasized, yet requiring a different
> inflection we can't evaluate before knowing the meaning of the text? If so,
> let's stop everyone! We're running into big troubles... at least on a formal
> pathway. The latter case defines a context-dependent semantic, which is good
> for a human being reading a text, because human beings uses natural languages
> and natural languages are strongly context dependet; it might also be good for
> a sentient bot, which would have a certain degree of artificial intelligence
> enabling it to disambiguish the language by understanding the context; but
> that's not good for HTML. By definition, HTML is definetly a programming
> language, and a programming language have to be context-free, but it seems to
> me that <b> and <i> are context-free just in their typographic semantics (the
> chance to modify such semantics via style sheets doesn't matter, from this
> point of view, since that's a consistent redefinition). Unless we anchored
> their semantics to another well defined one, establishing a somewhat
> relationship to make their existance meaningful and not just redundant -- and
> I was suggesting the idea that if <strong> expresses 'strong importance', <b>
> could express a 'lesser importance then <strong> text' or a 'relevance as
> attention focus or content outlining, helpful to improve the message
> understanding or to recall its theme, usually rendered with bold text and
> possibly requiring a voice inflection different from normal prose but not so
> incisive as very important content' (as an article abstract or some keywords
> could be classified), and if <em> expresses 'strong emphasis', <i> could
> express 'lesser emphasis than <em> text' or 'some change in the message
> context or its precisation, with respect to either its meaning or its
> linguistic constraints, usually rendered with italicized text and a bit of
> voice inflection' (as a foreign language word or a taxonomy name could
> indicate).

Could I possibly encourage you to split your paragraphs into smaller 
paragraphs?


> In other words, I'm not concerning whether the actual semantics of <b> 
> and <i> is consistent with common uses of italicized and bold text, and 
> with their conventional definitions (human-understandable, but perhaps 
> not machine-friendly), but whether that's well defined (context-free) 
> with respect to a user agent capabilities to correctly interpret and 
> present them. Visually that's painless, but non visually (non 
> graphically) I'm quite feeling the need for a greater context-freedom 
> (at least binding them to some more precise semantics, with respect to 
> which to scale <b> and <i> semantics and make them more context-free).

I have to admit to having no idea what you are talking about here.


On Tue, 25 Nov 2008, Pentasis wrote:
> 
> Just because HTML5 redefines the element does not mean that the element 
> will suddenly be semantic.

The key is that the way we have defined <b>, <i>, and <small> is roughly 
in line with what authors do already anyway, as much as other tags are 
roughly in line with how they are used.


On Tue, 25 Nov 2008, Nils Dagsson Moskopp wrote:
> 
> So can't we just mark all presentational elements as obsolete in a 
> clear, consistenst way, instead of trying to redefine them ? Maybe put 
> them into a "presentational annex" of the spec, that defines rendering 
> of obsolete elements ?

We will, for the actually obsolete elements.


> The thing I am concerned with is that if they are included like "normal" 
> (read: semantic) elements, authors will probably use them for new pages.

What's wrong with <b> as defined now?



There were a number of other e-mails in these threads that I haven't 
replied to here because they didn't propose changes to the spec. If there 
are any e-mails I should reply to, or any topics I should focus on, please 
let me know.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 17 December 2008 14:02:57 UTC