[whatwg] Element-related feedback from Ian Hickson on 2010-03-16 (public-whatwg-archive@w3.org from March 2010)

From: Ian Hickson <ian@hixie.ch>
Date: Tue, 16 Mar 2010 09:01:00 +0000 (UTC)
Message-ID: <Pine.LNX.4.64.1003160015010.13402@ps20323.dreamhostps.com>
This e-mail is a reply to a number of e-mails on various topics relating 
to the more document-related elements of HTML.

On Mon, 2 Nov 2009, Elizabeth Castro wrote:
>
> In 4.4.11, it says
> 
> > Sectioning content elements are always considered subsections of their 
> > nearest ancestor element of sectioning content, regardless of what 
> > implied sections other headings may have created.
> 
> Does that line mean that a section element is *not* a subsection of the 
> nearest implied section?

Correct.


> So, if there is no other explicit sectioning content, as in the example 
> given, then what would the section element be a subsection of?

Hm, yes, this was unclear. I've tried to clarify it.


> I don't get why Thud ends up on an equal level as Quux and Bar. It seems 
> like as a section under h2 it should be a subsection of that Quux h2, 
> just as the implied Bar section is a subsection of the implied Foo 
> section.

The <body> in that example counts as sectioning content. This was not 
stated in the spec previously. I've now made this clear.

Thanks for catching this.


On Tue, 3 Nov 2009, tjeddo wrote:
> 
> What if a standard link type called "citation" was added to the HTML5 
> specification? For example,
> 
> <a href="#bibentry-jones" rel="citation">[Jones, p. 88]</a>
> 
> After reviewing all the other link types and their corresponding 
> definitions in current draft specification this seems like a consistent 
> addition.

I recommend adding it to the wiki and then working to get it widely 
adopted. You don't have to depend on the spec for rel values, they 
registry is a publicly editable wiki. :-)


On Fri, 1 Jan 2010, Jim Jewett wrote:
> 
>     Evil Lawyer:  So, when did you stop beating your wife?
>     Defendant:  Never!
> 
> "Evil Lawyer" and "Defendant" aren't pronounced.  Their meanings (and 
> silence) are deduced from English conventions about punctuation.  I 
> would prefer a semantic tag.

Why? What problem would a semantic tag solve? The default styling here 
seems to not need any particular element; the above is perfectly 
understandable as is as far as I can tell.


> I'm expecting [scripts] to do something like increase the font size or 
> change the background for lines *I* have to memorize for *my* character 
> [based on the semantic marked in the page identifying the character], or 
> for cue lines that I have to recognize.

Are there any examples of this in the wild? Since this is technically 
possible today, if it's a use case important enough that we should address 
it, it should be easy enough to find examples of this.

I'm very reluctant to provide features for hypothetical problems that 
don't stem from a real market need. (If we start solving such problems, we 
would fast find ourselves on the path to feature bloat.)


> > You're still not saying why you want this element. What would <attrib> 
> > be good for? What UI would it trigger? How would users or authors 
> > benefit?
> 
> I would expect it to be used in License checkers that some organizations 
> would deploy to ensure they aren't violating copyright.

Wouldn't the Work microdata vocabulary be a better solution to this 
problem?


> I would expect it to be used by some scrapers looking for stock photos.

I'm not sure what you mean. Wouldn't fingerprinting the photos be more 
effective?


> I would expect it to be used with custom CSS for some users, who are 
> really looking for a model or photographer rather than an existing 
> photograph.

I don't understand this case. Can you elaborate? Maybe an example of this 
use in the wild would help.


> > Why would it be wrong to have an element to style titles [for titles 
> > of works]?
> 
> Turning around your favorite question, what is the semantic value?

It provides a way to have appropriate default styling (italics, in the 
visual medium) for a typographic feature that is widely used, while 
allowing it to be easily restyled independent of other uses of italics. 
This is the same benefit <em>, <strong>, <mark>, etc, have.


On Thu, 5 Nov 2009, Brian Blakely wrote:
>
> I can only imagine the usage of <address/> will be utilized more 
> productively if its intuitive purpose (arbitrary contact/postal 
> addresses) were its actual function.  As our friends at HTML5 Doctor 
> illustrate, it is all too easy to jump to conclusions and use this 
> element incorrectly.
> 
> Perhaps a <contact/> element would be more suitable for 
> document-specific contact info?  Just a thought, off-hand.

In general, I agree. I seem to recall we studied this a few years ago and 
found that it turns out <address> is actually used correctly (per HTML4) 
quite a lot (especially in autogenerated documentation pages), which is 
why we left it as is.


On Mon, 16 Nov 2009, Philip J?genstedt wrote:
>
> http://www.whatwg.org/specs/web-apps/current-work/multipage/common-microsyntaxes.html#parse-a-month-component
> 
> Is there a use case for machine-readable dates after 9999? I'm sure 
> HTML5 will have been obsoleted before it's meaningful to express 
> accurate times that far in the future. As existing similar formats use a 
> 4-digit year, adapting parsers for those is a lot easier if the HTML5 
> year format be exactly 4 figures. Also, it seems more likely that 
> >4-digit years will be typos than intended and useful as 
> machine-readable data. If there are no strong use cases for it, please 
> make it YYYY only.

Limiting formats arbitrarily seems short-sighted, even if we are talking 
about eight thousand years from now. In any case, parsing these dates is 
pretty trivial; I don't really see that there is much to be gained from 
using existing parsers. One is more likely to exactly match the spec's 
requirements if writing a parser from scratch.


> http://www.whatwg.org/specs/web-apps/current-work/multipage/common-microsyntaxes.html#parse-a-date-or-time-string	
> "10. If the date present and time present flags are both true, but position is
> beyond the end of input, then fail."
> 
> This seems to be a bug if you consider how '2009-11-16T' would be 
> parsed. The algorithm is supposed to return a date time or global date 
> and time, but '2009-11-16T' is valid as neither. The intent of this step 
> must be to make sure that T is always followed by a date, but it won't 
> work. Except from being in the incorrect order (after time present may 
> have been set to false) it also checks for the end of input, which 
> doesn't make sense when the "in content" variant is used.

Actually it's checking for the presence of the timezone.

However, you're right that "2009-11-16T" misparsed. There was an error in 
an earlier step; the "time present" flag shouldn't have been set to false 
if there was no time parsed, the whole algorithm should have failed 
instead. I've fixed this now. Thanks.


> Perhaps it would be possible to simply replace this algorithm with "try 
> parsing a global date and time, else try parsing a date, else try 
> parsing a time"? I haven't check carefully if this is equivalent though.

The change I made seems simpler. :-)


> http://www.whatwg.org/specs/web-apps/current-work/multipage/rendering.html#the-time-element-0
> 
> "When the time binding applies to a time element, the element is 
> expected to render as if it contained text conveying the date (if 
> known), time (if known), and time-zone offset (if known) represented by 
> the element, in the fashion most convenient for the user."
> 
> This is very vague. Anything which tries to localize the date/time will 
> fail because guessing the language of web pages is hard. Hard-coding it 
> to English also wouldn't be very nice. What seems to make the most sense 
> is using the "best representation of the global date and time string" 
> and equivalents for just time and date that have to be defined. Still, 
> I'm not sure this is very useful, as the same rendering (but slightly 
> more flexible) could be accomplished by simply putting the date/time in 
> the content instead of in the attribute. As a bonus, that would degrade 
> gracefully. Unless I'm missing something, I suggest dropping the special 
> rendering requirements for <time> completely.

The idea is to render the date or time in the user's locale, not the 
page's, though I agree that in some cases that could be confusing.

Maybe we should leave the localising behaviour to author CSS and not do it 
automatically by default?


On Tue, 24 Nov 2009, David Bruant wrote:
> Tab Atkins Jr. a ?crit :
> > On Tue, Nov 24, 2009 at 11:07 PM, David Bruant
> > <bruant at enseirb-matmeca.fr> wrote:
> >> For the ASCII art use case, what is said about "an alternative 
> >> description" strongly reminds the alt attribute of the img element. 
> >> Perhaps ASCII art should be done inside of an <img> element. The 
> >> <img> element is probably the HTML element which has the closest 
> >> semantic of the ASCII artist intention.
> >
> > ASCII art is indeed semantically closest to <img> (it's just an image 
> > done in a different medium), but there's no way to actually use <img> 
> > as such.  <pre> is the second-closest thing if you're going to include 
> > such a thing.
>
> [...] It's true that currently, <img> elements are not intended to have 
> a content, but ASCII art, as images, is probably the best (if not only 
> ?) reason to allow text content in img elements, thus naturally allowing 
> the alt attribute which doesn't exist in the "second-closest" semantic 
> element.

On Tue, 24 Nov 2009, Tab Atkins Jr. wrote:
> 
> It's impossible at this point to make <img> elements take contents. 
> They're void elements in every single browser in existence.

On Tue, 24 Nov 2009, David Bruant wrote:
>
> I take this argument as a "pro" argument for two reasons :
> - <img> are void elements in every single browser, so, if this "status"
> changes in HTML5, they can all change the behavior of <img> element at
> the same time (which would be harder if some browser had already given a
> meaning to a <img> content)
> - web developers know that so far, <img> elements were void elements, so
> adding a content to <img> won't make the least retro-compatibility
> problem with what already exists.
> 
> As a consequence, I propose that :
> - the src attribute of the <img> element becomes optional.
> - content is allowed in the <img> element and rendered if the src
> attribute is not present.

On Wed, 25 Nov 2009, Markus Ernst wrote:
> 
> I checked the following code in Firefox 3.5 and Internet Explorer 8:
> 
> <img style="white-space:pre; font-family:monospace">hello
> preformatted
> world</img>
> 
> Both browsers treat the text as following the img element, rather than 
> being the contents of it. Neither applies the styling; IE additionnally 
> displays a broken image placeholder before the text. This looks like a 
> backwards compatibility problem to me.

Yeah, this is basically a non-starter, as existing browsers do not handle 
this element in a way conducive to us making this change.


On Wed, 25 Nov 2009, Nikita Popov wrote:
>
> I think the idea of replacing the alt-attribute by the content of img is 
> very good. An image used as an <img> needs to be content-related an 
> therefore you often could place a descriptive text into it. This would 
> lead us to img being only a normal HTML-element with normal content, but 
> the ability of specifying an image by "src".

For this semantic we already have <object>. I don't think making <img> do 
this too would help much.


On Thu, 26 Nov 2009, Jeremy Keith wrote:
>
> In the section for the time element, the spec states:
> 
> "This element is intended as a way to encode modern dates and times in a 
> machine-readable way so that user agents can offer to add them to the 
> user's calendar."
> 
> http://www.whatwg.org/specs/web-apps/current-work/multipage/text-level-semantics.html#the-time-element
> 
> It seems very, very restrictive to dictate one single use case for an 
> element. Specifying an example use case, I could understand, but a 
> single use case? Isn't that kind of like dictating a single use for an 
> API before it has even been released?

I've clarified the spec to make it clear that is merely one example use 
case and not intended to be exclusive.


On Fri, 27 Nov 2009, Henri Sivonen wrote:
> On Nov 26, 2009, at 18:50, Jeremy Keith wrote:
> 
> > "The following extract shows how an IM conversation log could be marked up.
> > <p> <time>14:22</time> <b>egof</b> I'm not that nerdy, I've only seen 30% of the star trek episodes"
> 
> What's the point of having the time semantically marked up in this 
> example? What kind of processing scenario would benefit?

Others have given use cases also, but the one I had in mind at the time of 
writing that example was localising the time from CSS, once CSS adds 
localisation of dates and times. (This has been discussed in the CSS 
working group for some time.)


On Thu, 10 Dec 2009, Hugh Guiney wrote:
> 
> I don't understand why the time element is allowed to specify an 
> arbitrary hour, but not an arbitrary month or year.

Keeping things simple, mostly. Baby steps!


> My own use case involves marking up years of publication for documents I 
> have created, to be displayed in an online resume that can be sorted by 
> date. I do not necessarily have the original timestamps for every file, 
> yet I can recall the years in which they were published. In this case, 
> the year "2005", for instance, is semantically distinct from the numeral 
> "2005", and though the difference can be inferred from context by a 
> human it can not by a machine, hence why things like <time>2005</time>, 
> or <time datetime="2005">4 years ago</time> would be useful here. But 
> under the current specification, these uses are invalid, meaning I'd 
> only be able to specify exact dates with meaningful language, as in 
> <time datetime="2005-01-01">2005</time>, and hack around it for inexact 
> dates with non-semantics like <span class="datetime">2005</span>.

It's not clear to me that there's any particular benefit to marking up 
years in this way. However, if it turns out that there's a lot of demand 
for this, I think it might make sense to add it in a future version.


On Thu, 10 Dec 2009, Tab Atkins Jr. wrote:
> 
> I agree with Meyer on the first one.  That's a useful case.  In 
> addition, <time>'s usefulness in Microdata is somewhat impaired by its 
> inability to mark up months or years.  These are by far the most common 
> 'fuzzy dates' that one would have to mark up for embedded metadata.  
> Their lack means that vocabularies need to structure themselves to take 
> two dates per real 'date', to allow such fuzziness, which means that 
> date *ranges* would need *four* dates specified. That's just silly.

Date ranges are also a type one could easily imagine adding in a future 
version.


On Tue, 5 Jan 2010, L. David Baron wrote:
> 
> It might be worth saying that [<br>] is equivalent to LINE SEPARATOR in
> terms of bidi processing, as HTML4 did:
>   # With respect to bidirectional formatting, the BR element should
>   # behave the same way the [ISO10646] LINE SEPARATOR character
>   # behaves in the bidirectional algorithm.
>      -- http://www.w3.org/TR/html4/struct/text.html#h-9.3.2.1
> 
> As I understand it, the bidi algorithm [1] has two parts:
>  * resolution, in which characters are assigned embedding levels
>  * reordering, in which the characters are reorderded into their
>    left-to-right display order by, for each N decreasing from 63 to
>    1, reversing all contiguous runs of embedding level N or higher
> 
> The importance of being a line separator is that *resolution* is run on 
> paragraphs (so is run on units containing line separators in the 
> middle), but reordering is run on lines (so it is not run on units 
> containing line separators).
> 
> This means that characters on one side of a line separator can influence 
> the directionality of characters on the other side, but reordering can't 
> move them to the opposite side of the BR (i.e., across lines).

The spec says that <br> is equivalent to U+000A for rendering and defers 
to CSS and Unicode for rendering rules. Is that not sufficient?


On Fri, 5 Feb 2010, Anne van Kesteren wrote:
>
> Legal documents often use various indicators for list items. E.g.
> 
>  a. ...
>  b. ...
>  c. ...
> 
> or
> 
>  1. ...
>  2. ...
>  3. ...
> 
> or
> 
>    I. ...
>   II. ...
>  III. ...
> 
> or
> 
>  A. ...
>  B. ...
>  C. ...
> 
> etc.
> 
> These indicators are part of the content and cannot be governed by style 
> sheets. End users having their own custom style sheets overwriting the 
> indicators with their own preference would be a problem, for instance.
> 
> I have seen at least one editor used that generates markup like this:
> 
>  <ul>
>   <li><span class="ol">a.</span> ...</li>
>   ...
> 
> to work around this. You can see this online here:
> 
>  http://regels-stadskanaal.nl/
> 
> I think it would be good if we either solved this problem natively or at 
> least gave some advice for people finding themselves in a similar 
> situation.

On Fri, 5 Feb 2010, Tab Atkins Jr. wrote:
>
> Since they are indeed part of the content, not a question of style, I'm 
> not seeing anything wrong with putting the marker directly in the 
> content.  Preferably you'd still want to use an <ol>, though, with 
> list-style:none on it.  User stylesheets can't generally turn on <ol> 
> markers, as a lot of sites use them or <ul>s as navigation and the like, 
> and completely restyle them in such a way that the website would be 
> rendered horribly if you turned the markers back on.
> 
> Some advice might be a good idea, just recommending still using <ol>, 
> turning off markers, and then putting the exact marker content into the 
> <li>s.

On Sat, 6 Feb 2010, Thomas Broyer wrote:
> 
> How about marking up the "list" with ARIA instead of <ol>/<li>?
> 
> Something like:
> <div role="list">
> <p role="listitem">a.
> <p role="listitem">b.
> </div>

On Sat, 6 Feb 2010, Tab Atkins Jr. wrote:
> 
> That would work too, but it doesn't seem to have any inherent advantage 
> over
> 
> <ol style=list-style:none;>
>   <li>a. foo
>   <li>b. foo
> </ol>

I think long-term the good solution is to have an attribute on <li> that 
overrides the marker entirely; something like:

   <ol>
    <li marker="a">foo1
    <li marker="b">foo2
    <li marker="d">foo3
    <li marker="e">foo4
    <li marker="e-bis">foo5
    <li marker="f">foo6
   </ol>

...or some such. However, we haven't yet gotten <ol reversed> implemented 
and proven, so it's probably a bit early to start adding features to <li> 
also. If we could get the CSS side sorted out before we add this, we could 
have a CSS-based transition when we finally add it, which would help a lot.


On Fri, 5 Feb 2010, Brian Campbell wrote:
> 
> The obsolete and non-conforming @type, along with the @value attribute 
> on <li>, can be used for this purpose:
> 
> <ol type=a>
>   <li value=1>...
> </ol>
> 
> Or, if you want to keep the type information together with the value:
> 
> <ol>
>   <li type=a value=1>...
> </ol>
> 
> Would it make sense to make this no longer obsolete and non-conforming, 
> as the list item type really is meaningful in many documents? Also, is 
> the behavior of @type currently documented anywhere in HTML5? While the 
> values that @type currently accepts are fairly limited (a, A, 1, i, I, 
> as far as I know), they could be extended to include all of the values 
> defined in CSS, with the old values deprecated.

It's tempting. In practice, though, when this kind of thing is usually 
used, it's because the paragraph numbering is critical to the meaning, and 
authors specifically don't want the labels to change when things in the 
list change. So I'm not sure type="" is quite the semantic we want here.


On Sat, 6 Feb 2010, Markus Ernst wrote:
> 
> This looks like part of a more general problem to me. There are more 
> situations where you want custom content in the place of list 
> indicators:
> 
> For example, in a CV you might want the years there:
> 
> 1977      ...
> 1978-1982 ...

On Sat, 6 Feb 2010, Tab Atkins Jr. wrote:
> 
> I think a <table> is perfectly appropriate here.  I used one in my
> resume.  It's completely justifiable as tabular data.

Indeed. Or <dl>.


On Sat, 6 Feb 2010, Markus Ernst wrote:
> 
> Or, very common in forms, a check box or radio button:
> 
> o ...
> o ...

On Sat, 6 Feb 2010, Tab Atkins Jr. wrote:
> 
> Simply putting the checkbox as the first content in the <li> works
> well for me there.  You can then suppress the list-style or not.

On Sat, 6 Feb 2010, Markus Ernst wrote:
> 
> No, as multiline label text will render as:
> 
> o ...
> .....
> 
> A (I think) simple task, as rendering a radio button and label as
> 
> o ...
>   ...
> 
> is a hard task to achieve (as any kind of styling that is done with a 
> combination of indents and tabulators in layout programs).
> 
> I admit that this is a styling issue (and already tried to post this to 
> the CSS group months ago, but found that this group does not accept gmx 
> mail addresses), mainly due to the lack of tabulators in HTML and CSS. 
> But it is in the same category as the problem Anne posted: A list-style 
> appearance with the list item being part of the content. If there were 
> tabulators, they would be used in Anne's use case, and there were no 
> problem.

On Sat, 6 Feb 2010, Tab Atkins Jr. wrote:
> 
> Ah, gotcha.  True, then.  This would have been easier with the 
> display:list-item-marker value, but that got dropped in favor of a 
> proper ::marker pseudoelement.  It's possible that this should be 
> revived and magicked into working appropriately, essentially moving the 
> item into the content of the ::marker.  Alternately, some magic by way 
> of the Generated and Replaced Content module could be employed. (I do 
> intend to try and revive that module.)
> 
> In any case, though, this does indeed seem to be a styling issue.  We 
> can move the styling-relevant parts of the discussion over to www-style.  
> Within the bounds of HTML, though, I think that the advice as I phrased 
> it is optimal - just use an <ol>, suppress the list-style, and put the 
> marker in the content.

The above seems like a styling issue.


On Sat, 6 Feb 2010, Markus Ernst wrote:
> 
> Third (this a pure style problem though), sometimes you want just some 
> custom character there, such as an n-dash:
> 
> ? ...
> ? ...
> 
> While the third one can be achieved with al list-style-image, with the 
> downside that it will not be affected by changes of the text size, the 
> other examples need complex CSS trickery including floats, or layout 
> tables.

There are proposed solutions to let you change the list style marker in 
CSS, so I think that's not a problem we need to solve.


On Sat, 6 Feb 2010, David Bruant wrote:
> 
> One solution could be to use <style> element with "scoped" attribute to 
> define a style only for those lists. This way, embedding a document will 
> embed the style element. And if the styles within the <style> are 
> exhaustive enough, there is no risk of overwriting by end user 
> stylesheet, isn't it ?
> 
> something like this :
> 
> <section id="mylegaldocument">
> <style scoped>
>     ol{}
>     li{}
> </style>
> 
> <!-- h1, blabla -->
> 
> <ol>
> <li>
> <li>
> <li>
> </ol>
> </section>

That still breaks down without CSS though.


On Mon, 1 Mar 2010, Phil Pickering wrote:
> 
> Previously in HTML 4.0 Strict and XHTML 1.0 Strict, any content inside
> the <blockquote> element had to be contained inside at least one <p>
> element.
> 
> In HTML 5, that requirement appears to have been deprecated as the
> following element validates successfully:
> 
> <blockquote>The dream behind the Web is of a common information space
> in which we communicate by sharing information. Its universality is
> essential: the fact that a hypertext link can point to anything, be it
> personal, local or global, be it draft or highly polished. There was a
> second part of the dream, too, dependent on the Web being so generally
> used that it became a realistic mirror (or in fact the primary
> embodiment) of the ways in which we work and play and socialize. That
> was that once the state of our interactions was on line, we could then
> use computers to help us analyze it, make sense of what we are doing,
> where we individually fit in, and how we can better work
> together.</blockquote>
> 
> The current HTML 5 specification uses the <p> element in the usage
> examples, but does not mention whether it is required.

The content model of <blockquote> defines whether it's required.


> For the sake of clarity, might I suggest that in the specification 
> document there is a usage example of a <blockquote> element where a 
> quote consisting of a single paragraph is included, but does not use the 
> <p> element to contain that quote.
>
> Or, maybe a one-line explanation stating that the previous requirement 
> of the <blockquote> element re. content being contained in a <p>, has 
> now been loosened up.

It's not just about <blockquote>, it's any flow element. But I've tweaked 
an example to show this.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 16 March 2010 02:01:00 UTC