[whatwg] Requests for new elements for comments

On Sun, 4 Sep 2011, Shaun Moss wrote:
> 
> I've joined this list to put forward the argument that there should be 
> elements for <comment> and <ad> included in the HTML5 spec.

We already have an element for comments and other self-contained document 
modules, namely, <article>. The spec in fact specifically calls out an 
<article> nested in another <article> as being, by definition, a comment 
on the outer <article>.

For advertisments, I do not think it makes sense to add an element. In 
practice, it would likely not end up being used, since doing so would make 
it too easy to hide advertisments.

However, the <aside> element is a close fit for the semantic, so I would 
recommend using that.


> Please also let me know the process for submitting a formal proposal to 
> the WHATWG or the W3C about this.

This is it.


On Mon, 5 Sep 2011, Shaun Moss wrote:
> 
> A suggested semantic meaning for <comment>: /The content of this element 
> has been contributed by a website user in reference to an article, 
> discussion topic, status update, image, video, or some other item of 
> content./

This is basically exactly the semantic for a nested <article>.


On Mon, 5 Sep 2011, Shaun Moss wrote:
> > 
> > http://www.whatwg.org/specs/web-apps/current-work/multipage/sections.html#the-article-element
> 
> Yes, but this is not semantic!!! Comments are not articles.

<article> isn't just for articles. That's the point.

Note that its name is irrelevant here. It could be called <pineapple> -- 
what matters is what it is defined to mean, not what it's name is. And 
it's definition is one that covers both articles and comments. They are 
both self-contained compositions.


> They are completely different.

Actually, they are remarkably similar. I think it's anachronistic to 
consider that the utterances of the site owner are in some way distinct 
from the utterances of the site readers. What makes them different?

On the contrary, on the Web there _is_ no difference. An article is just a 
comment that has been hoisted to a more prominent position.


> Comments can appear in reference to things that are not articles (such 
> as status updates), and therefore would not appear inside an <article> 
> tag - so how would the browser recognise them as comments?

Status updates are another example of something that would be appropriate 
in <article>.

On Mon, 5 Sep 2011, Shaun Moss wrote:
> 
> Please explain to me how it makes sense for a comment to stand on its 
> own.

Reddit is a great example of this.

Every comment on reddit has its own permalink and can be referenced alone.

Consider this page:

   http://www.reddit.com/r/webdev/comments/ito6v/html5_article_tag_for_ecommerce_products/

It starts with a comment by iamfuzzydunlop. That's the "topic" article, 
for lack of a better phrase. Below that are other comments in response, 
for example holizz says "An interesting question.". You can see that 
article -- er, comment -- on its own page:

   http://www.reddit.com/r/webdev/comments/ito6v/html5_article_tag_for_ecommerce_products/c26kn8g

Here it is standing on its own.


> To an HTML author, especially a newbie, an article *is* a newspaper 
> article, and this is entirely distinct from a user-submitted comment 
> related to the article. Semantics isn't just for robots, it's for 
> humans, too - a fact that seems to be frequently overlooked. Giving 
> elements obscure, unobvious meanings in the spec is a kludge, plain and 
> simple. For example, to the WHATWG and W3C the <b> tag now basically 
> means "different but not important". The <i>, <u> and <s> tags have 
> similarly gained bizarre new definitions. To everyone else on the 
> planet, however, <b> means bold, <i> means italic, <u> means underline 
> and <s> means strikethrough. This may come as a surprise, but 99.9% of 
> HTML authors don't read specs. They look at a tag, and think, now what 
> would I use that for? "Ok, so the <table> tag is for tables, right? I 
> guess <article> is for articles, then. Oh, it's for user-submitted 
> comments, too? <confusion>wtf?</confusion>"

Unfortunately, it is basically impossible for a single word -- or even a 
single letter -- to stand for a careful description of an element's 
semantics. While I agree that most authors won't read the specs, one can 
at least hope that authors will use quick-reference sheets that do include 
the basic definitions of the elements. Some authors might even refer to 
more comprehensive documentation such as:

   http://developers.whatwg.org/


> We don't need a whole slew of new tags. We need *one*: <comment>, and /maybe/
> one more: <comments>, to wrap them.

<article> covers a wide range of semantics:

 - forum posts
 - newspaper articles
 - magazine articles
 - books
 - blog posts
 - comment on a forum post
 - comment on a newspaper article
 - comment on a magazine article
 - comment on a blog post
 - an embeddable interactive widget
 - a post with a photograph on a social network
 - a comment on a photograph on a social network
 - a specification
 - an e-mail
 - a reply to an e-mail

Should each of these get its own element?



On Tue, 6 Sep 2011, Shaun Moss wrote:
> On 2011-09-05 11:13 PM, Benjamin Hawkes-Lewis wrote:
> > On Mon, Sep 5, 2011 at 9:55 AM, Shaun Moss<shaun at astromultimedia.com> 
> > wrote:
> > > Please explain to me how it makes sense for a comment to stand on 
> > > its own.
> > Works just as well as all those blog posts that are just commentary on 
> > something someone else has written. (And which often are syndicated as 
> > comments via pingback.)
> In which case they can (and should) be marked up as comments.

Should a newspaper article commenting on a politician's policies be marked 
up as a comment?

I don't understand what distinction you are making between "article" and 
"comment".


> Yes, well, forgive me for seeming anglophone-centric, but the fact remains
> that HTML as well as most other programming languages use English words

I think it's better to look at HTML as using mostly opaque strings that 
happen, for mnemonic purposes, to have similarities with English words.


> most of the people who use these languages speak English

I'm not sure this is true.


> and presumably they expect something to mean what it says.

It does. The word "article", for instance, has many meanings. In the 
dictionary I happen to look in first, it defines "article", in part, as "a 
piece of writing included with others in a newspaper, magazine, or other 
publication", which is remarkably close to the spec definition, and does 
not, to me, distinguish between pieces of writing written by the site's 
owners or staff, and pieces of writing written by the site's readers.


> Otherwise we should just name our tags <dsfsdas> and <xcvxcv>.

Indeed we could. I think there is marginally more value in using mnemonic 
names, though.


> Sorry that it's difficult for you to think of concise names, but I 
> hardly think <comment> is ambiguous.

<comment> could mean an HTML comment (i.e. text not to be shown to the 
user), or it could mean an article commenting on another (the meaning you 
are advocating for), or it could mean a spoken utterance (similar to <q> 
but only for things said out loud by others, say), or it could mean that 
its contents are not intended to be objective (to be used in articles to 
indicate which parts are opinion and which are supposedly fact), etc.

I do not think this would be any clearer than <article>.


On Tue, 6 Sep 2011, Shaun Moss wrote:
> 
> Back to the main point of marking up comments, I offer youtube as an 
> example. http://www.youtube.com/watch?v=BRG5VNNUq_E
> 
> Here we have the item being commented on (the video) in a full-width 
> block, with the lower half of the page divided into two sections, 
> comments on the left. If user-submitted comments must be <article> tags 
> inside <article> tags, then virtually the whole page would have to be 
> inside an <article> tag

Yes. If the YouTube authors wanted to mark up these semantics, the current 
<div id="content"> element would be an <article>, and the current <li 
class="comment ..."> elements would be nested <article>s.


> The problem I am trying to solve is a perceived error in the HTML5 spec, 
> which specifies that comments should be marked up as articles inside 
> articles. I believe this to be an error for several reasons:
> 
> 1. Articles and comments are different, and should therefore use 
> different elements (otherwise the reference to marking up user-submitted 
> comments as articles within articles should be removed).

How are they different?


> 2. Comments are a unique type of content, since they are submitted by 
> users, not site developers or content managers.

What's the difference?

Say I wrote a post on someone else's blog, should it not be considered an 
article? (e.g. the WHATWG blog is open to anyone to post on; if you posted 
on it, should it not be an <article>?).

Say site owner A wrote an article to which user B replied with a comment, 
and site owner A then further comments on user B's reply. Should user B's 
comment be marked up as a comment but A's comment be marked up as an 
article? This seems like a very odd requirement to put on a CMS.

Should social networks only use the comment markup, because the "articles" 
posted on them are mainly not posted by the site's developers?


> 3. Robots and plugins can extract comments from web pages more easily if 
> they have their own element. Comments can then be more easily 
> syndicated, displayed, hidden, styled, etc.

I don't see how a separate element would make this meaningfully easier 
than nested elements.


> 4. Comments often apply to things other than articles, such as blog 
> posts, forum topics, social network status updates, images, videos, 
> links, and other comments, which should not have to be marked up as 
> articles just so the comments can be marked up as articles within 
> articles.

All of the above are articles, IMHO, and as defined in the spec. (Many of 
them are explicitly listed as examples of such.)


> 5. Comments sometimes appear in a different region of the page than the 
> item that they are referencing, hence the markup for comments should not 
> have to be contained within the markup of the item.

Do you have any examples of this?


> I'm not trying to be a jerk. Please seriously consider what I am showing 
> you. I will happily write a proposal for a <cmnt> element for the W3C 
> and browser vendors, and submit it to you for (haha) comments.

I hope it is clear that your proposal is being carefully considered, even 
if so far I haven't agreed with you. :-)


On Tue, 6 Sep 2011, Jukka K. Korpela wrote:
> 
> We probably understand the words "self-contained" and "independently" 
> very differently then. I cannot see a typical comment as self-contained, 
> as it by definition implies the context created by the document being 
> commented on. So how could it be *independetly* reused and syndicated?

Well, continuing the previous example of reddit, consider:

   http://www.reddit.com/r/bestof/

Every post there is an example of a comment that was independently reused 
and syndicated.


> A typical comment might be a bit more than "Me too!" or "I especially 
> like the second paragraph" or "Gruntmaster 6000 is the best!" But it's 
> seldom written to be self-contained or reusable independently (if at 
> all).

The same applies to articles themselves. Few are written in isolation. 
Consider this article, for instance:

   http://mashable.com/2012/01/25/obamas-state-of-the-union-where-was-the-tech/

It's just like the examples you give, just longer (and more negative). It 
still gets syndicated, e.g. on what I presume is a spam farm:

   http://youtubevideodownloaderonline.com/?p=1435


On Tue, 6 Sep 2011, Tab Atkins Jr. wrote:
> On Tue, Sep 6, 2011 at 8:28 AM, Jukka K. Korpela <jkorpela at cs.tut.fi> 
> wrote:
> > 6.9.2011 12:40, Benjamin Hawkes-Lewis wrote:
> >> b) Since a comment is just a "self-contained composition", it can be 
> >> marked up with<article> ??whether nested inside another<article> ??or 
> >> not.
> >
> > If comments are generally "self-contained compositions", what would be 
> > an example of a composition that is _not_ self-contained?
> 
> A <section> of an article, for example.

Indeed.


On Tue, 6 Sep 2011, Jukka K. Korpela wrote:
> 
> I see no reason why a section of an article could not be self-contained.

Authors are welcome to mark up whatever levels they want as self-contained 
or not. Typically, however, an author does think of his whole article as 
syndicatable as a unit, while individual paragraphs or sections are not.

(One might question whether it makes sense to talk about the serialised 
syndication of a long work, e.g. a book published chapter by chapter, as a 
series of articles or a single article with several sections; however, in 
practice I think we would see it marked up as both, depending on the 
intent of the specific publication.)


On Tue, 6 Sep 2011, Jukka K. Korpela wrote:
> 
> Which "user problem" in that sense does _any_ of the new elements in 
> HTML5 solve?

Mostly these new elements make authoring a bit easier.


> I could list down a few, but elements like <footer> do not solve any 
> user problems. Or author problems; introducing <footer>...</footer> just 
> as shorthand for <div class=footer>...</div> is worse than pointless - 
> especially since the latter actually works well, whereas <footer> needs 
> extra tricks even to get the styling going.

IMHO having a single element rather than having to use classes makes page 
markup significantly easier to maintain. Naturally, if you are happy with 
<div class="..."> for everything, you are welcome to continue doing that.


On Tue, 6 Sep 2011, Benjamin Hawkes-Lewis wrote:
> 
> <nav> and <article> can be used to allow users to skip navigation, move 
> to the next block of self-contained content, or move to the next content 
> of the page.
> 
> <section> allows authors to express heading levels beyond 6 (and thus 
> allow users to navigate by such headings), and more easily put content 
> from disparate sources together (making it less likely users will be 
> presented with an incorrect heading order).

Indeed.

On Tue, 6 Sep 2011, Kornel Lesi?~Dski wrote:
> 
> Even if we agree that a nested <article> can be a comment, there is no
> guarantee that every nested <article> is a comment.

What exactly do you mean by "comment"?

Every nested <article> is defined as being a post whose context is the 
parent <article>.

People can misuse any markup, of course, but I don't see why they'd misuse 
<article> any more or less than type="comment" or <comment>.


On Wed, 7 Sep 2011, Shaun Moss wrote:
> 
> - An article can stand alone, without comments, even if those comments 
> add content (e.g. PHP documentation). A comment, however, needs context

I challenge the premise of both of these statements.


> hence the addition of the 'for' attribute.

I haven't added this proposed for="" attribute to <article> yet, but once 
people start using <article>, we can reconsider this if it turns out to be 
a common need.


> You would not be able to take a comment such as "Yeah, I love that 
> video!", post it on a page by itself, and have it make sense. This is 
> what I understand "stand-alone" to mean.

Numerous examples of this have been shown in this thread, so it's clear to 
me that comments do sufficiently fall into the "standalone" category.


> Another useful feature of comments would be the ability to extract 
> conversations from web pages. One comment could be "for" an article, 
> video, link or whatever, but a /reply/ to that comment could be "for" 
> the previous comment. With the current spec, this would require placing 
> an article inside an article inside an article, and so on for however 
> many replies there are (consider our current email thread, for 
> instance). This is not beautiful or practical, in my opinion! It would 
> make nested tables look elegant.

I don't see why it's a problem. It's what people are doing now, just with 
<div> (or sometimes <li>) instead of <article>.


> As yet another use case, comments are often marked up differently. 
> Consider this CSS:
> 
> /* this CSS applies to articles as well as comments */
> article {
>   background-color: white;
>   font-size: 1em;
> }
> 
> /* this CSS is for comments, and overrides the previous definition */
> article article {
>   background-color: silver;
>   font-size: 0.8em;
> }
> 
> vs. this:
> 
> article {
>   background-color: white;
>   font-size: 1em;
> }
> 
> cmnt {
>   background-color: silver;
>   font-size: 0.8em;
> }
> 
> Which would you prefer to code?

In practice I doubt a site's real CSS would be anywhere near as simple as 
either of these.


On Sun, 4 Sep 2011, Jukka K. Korpela wrote:
>
> 4.9.2011 23:27, Odin wrote:
> 
> > We already have a comment tag. It's listed in the article-element
> > section of the spec. Article within article is suggested to be a
> > comment:
> 
> Suggested, not defined.

"When article elements are nested, the inner article elements represent 
articles that are in principle related to the contents of the outer 
article."

That is intended to be a normative definition. (The word "represent" has 
normative meaning in the HTML spec.)


> If we assume that authors use elements as per the spec as currently 
> worded, you _cannot_ decide that an article inside an article is a 
> comment. Just as it might be. It could be anything "in principle related 
> to" the contents of the outer article.

I don't understand the distinction.

What's the difference between "comment" and "article in principle related 
to another article"?


On Tue, 13 Dec 2011, Nikhilesh Jasuja wrote:
> 
> My takeaway from these discussions has been:
>
> Use cases for the new element(s) include
>       1. Users being able to hide comments and comment areas. (I'd like to t
>       2. Easier syndication of both the comments and the parent <article>
>       (because parent is now unencumbered/uncorrupted by user comments)
>       3. A signal to search engines analogous to rel=nofollow ("Yes this
>       content is on my website but I can't attest to its quality")
>       4. Screen readers can navigate comments more easily..or skip them
>       altogether

All of these are equally well handled by nested <article> elements.


> The problems with using nested <article>s for comments are:
>       1. A nested <article> does not necessarily mean a user-generated
>       comment. So it's ambiguous.

I don't understand why this is the case. Can you elaborate?


>       2. For threaded conversations, there would be a lot of nesting.
>       Nesting in and of itself is not a bad thing but when trying to syndicate
>       the original (parent) <article>, this becomes difficult. A <cmnt
>       for="thearticle"> is more elegant.

In practice I think nesting is significantly easier than a for="" 
attribute. However, we could add for="" to <article> if it turns out to be 
necessary, so I don't think that's a huge problem.


>       3. A webmaster may want to structure markup in a way that makes
>       nesting difficult. e.g. <article id="thearticle">..</article><div
>       class="advert">..</div><div
>       id="relatedcontent">..</div><commentsarea><form><textarea>your opinions
>       here</textarea><button>Submit</button></form><cmnt
>       for="thearticle">BS!!</cmnt></commentsarea>. In such cases, forcing the
>       comments to be nested <article>s would require unnecessary CSS
> calisthenics
>       to make it look right.

This seems cleaner:

  <article id="thearticle">
   ...
   <aside class="advert">
    ...
   </aside>
   <aside id="relatedcontent">
    ...
   </aside>
   <section>
    <form>
     <textarea>your opinions here</textarea>
     <button>Submit</button>
    </form>
    <article>
     Lovely!!
    </article>
   </section>
  </article>


> What's the process for introducing new elements into the spec? It must 
> be non-trivial ..a new element is a pretty big deal. Do people discuss 
> on the mailing list, agree it must be done and then some people 
> volunteer to write the spec? I want to help (if the more knowledgeable 
> minds in the group agree these new elements are a good idea).

For details on the process, please see the FAQ:

   http://wiki.whatwg.org/wiki/FAQ

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Wednesday, 25 January 2012 14:26:31 UTC