W3C home > Mailing lists > Public > public-html@w3.org > March 2007

Re: WYSIWYM editors

From: Mihai Sucan <mihai.sucan@gmail.com>
Date: Sun, 18 Mar 2007 14:31:19 +0200
To: public-html@w3.org
Message-ID: <op.tpdw2httmcpsjgr0b0dp@localhost.localdomain>

Very interesting thread.

Le Sun, 18 Mar 2007 06:51:22 +0200, Marcos Caceres <m.caceres@qut.edu.au>  
a écrit:

> Maybe it's the programmers who are screwing up the markup as they are
> the ones writing the PHP, .Net, Rails, ColdFusion, etc that eventually
> comes out as HTML . Have you seen the nonsense markup that .Net's web
> controls produce? Seems to me programmers just as likely to be
> responsible for poor markup than anyone else.

No, actually, everyone screws up the markup:

a) the designer
b) the programmer
c) the libraries being used
d) the the frameworks (ASP.NET, Ruby on Rails, etc)
e) Content Management Systems
f) Web browsers

Each and every single one is responsible for the generated tag soup.

a) the designers

Le Sun, 18 Mar 2007 06:51:22 +0200, Marcos Caceres <m.caceres@qut.edu.au>  
a écrit:

> As an indication, if you want to know what topics web designers are
> interested in go to a list apart [1] or look at CSS garden [2]. There
> you will find that yes, designers understand menus as list, and that
> probably a lot of designers understand HTML/CSS better than you think.

Those are expert designers. They do not represent the masses.

The definition of a web designer is very vague.

Blind-folded web designers (or extreme novices):
There are those amateur web designers who do layouts in Ulead PhotoImpact,  
or in best case with Photoshop. They use the provided tools to slice the  
layout, to export it to HTML. After that, there's lots of guess work, when  
they create the other pages, using products similar to Dreamweaver. They  
change values in attributes, in CSS properties, they add/remove tags, just  
to see what happens, hoping the page will eventually show as they want in  

"Real" web designers who work for local companies, regional, or even  
national companies
These create their layouts, on schedules, for the company they work, most  
likely in Pohotoshop (since it might be a requirement). Then they redo the  
layout in Dreamweaver (or similar). Similarly to the amateurs, they do  
lots of guess work when something doesn't work as they want - that's  
because they *still* do not know *anything*. They do combine CSS with  
table layouts.

Another type of web designers is those who know to code, who use  
Dreamweaver (or similar products) just for convenience - they almost know  
to manually code the layout. They still do some guess work, but they are  
on the right path.

And, finally, there are two types of experts: experts who like to write  
everything their own way, who try to write semantic code, CSS layout, etc.  
The other type of "experts" are those in big companies, which have a  
serious name, "experts" that still do table-CSS layouts and love using The  
Tools of The Future, the best productivity tools: Dreamweaver, Frontpage,  

b) the programmers

Programmers who know to code want to write their own stuff, making sure  
everything in their output is generated by *them* not by some library  
"magic". They like to know everything that happens.

And there's the majority of programmers: give them a library which can  
make them cofee, then they'll be happy. They want to finish their projects  
as fast as possible, with as little coding as possible, they don't care  
what the libraries do. "What do you complain? Does it work or not?"

c) the libraries

By design, libraries offer more functionality than is needed within the  
given project, project page, etc.

Add to that: some libraries dont even generate proper code, some of them  
are poorly coded.

d) the frameworks

Similar to libraries, Ruby on Rails, ASP.NET and others, provide lots of  
"magic": "just look how easy it is to create a blog!". All of this is done  
without developers being aware *what* code is generated - they don't even  
look at the HTML source, they only do this when it's absolutely necessary  
(something doesn't *look*/work as they want).

e) the Content Management Systems are built on top of all the above. Now  
imagine how good they must be. The CMS is not the only one to be blamed  
for generating bad markup, because we also have...

f) the Web browsers, which themselves generate bad markup in designMode.

Le Sun, 18 Mar 2007 05:29:30 +0200, Robert Accettura  
<robert@accettura.com> a écrit:

> Take the following as an example (it's not really a proposal): If the  
> markup
> were so that all tags were equal (think <span>,<a>,<img>,<input>, and a  
> handful
> of others as the only tags not depricated), and an attribute  
> ("datatype") were
> to describe the data in more casual terms of what it contains ("quote",
> "citation", "navigation", "paragraph", etc. it's much more intuitive.

I believe this is utopic. You cannot create a fool-proof language. No  
matter what language, people *will* write bad code.

The fact C/C++ can't be compiled until you fix all compilation errors,  
doesn't mean programmers inherently write proper code.

There are pages which use very ugly PHP code.

Le Sun, 18 Mar 2007 08:17:16 +0200, Robert Accettura  
<robert@accettura.com> a écrit:

> It's not that they can't (technically a child 8 years of age, can learn  
> it...
> it's not difficult) but they don't care to.  And nothing you say/do will  
> force
> them to change, other than require XML and completely break with the  
> slightest
> defect.

Correct, they do not care. It just works.™ Why bother more?

However, it's false to assume they'll suddenly write proper code if you  
require the XML rules (proper nesting, closing tags, etc). I've seen many  
XHTML 1.0 Strict pages, valid, but ironically served as text/html. The  
code was still a mess, even if it has proper tag nesting, tag closing, etc.

What we can do:

1. proper frameworks can be developed
Frameworks that generate better markup.

2. server-side scripting languages with better support (HTML5 parsing,  
solid DOM support, etc)

I do not want, web developers in general do not want, to write their own  
DOM implementation, their own HTML5 parser, XHTML/HTML serializer, etc. We  
also don't want to write our own htmltidy-like tool.

The server-side scripting languages must push forward their tools for such  

3. better web browsers

Web browsers must have better tools for reporting CSS, JS, HTML errors,  
DOM Inspectors, CSS Inspectors, JS debuggers, etc. This is already  
happening and it's very good.

Web browsers also must generate better code in designMode. They need some  
kind of htmltidy integrated within, so when developers try to read the  
generated HTML code, we automatically get a clean code.

4. better content management systems
Once we have better web browsers, better frameworks and more tools, web  
developers will be able to code proper content management systems.

This is not to say that *now* better content management systems can't be  
made. Actually, fairly good content management systems can be done right  
now, but ... most likely it takes too much work for average Joe web  

What we can't do:

1. We can't teach every single web designer about proper HTML/CSS. I  
believe a designer is a designer, he/she must evolve in his/her own areas  
of interesting, (application GUIS, web interfaces, digital art, 3D  
modelling, digital painting, whatever). He/she should not go over board  
and do it all. (Jack of all trades, master of none.)

2. We can create languages within which developers are constrained enough  
to write proper code - that is, if you want the language to be used by the  

3. We can't teach users of various content management systems to not use  
ugly colours, bold, font changes, what-not. They don't need to learn this.

Le Sun, 18 Mar 2007 08:17:16 +0200, Robert Accettura  
<robert@accettura.com> a écrit:

> Don't forget even Google's homepage uses things like <font>, unnecessary
> <table>, <b>... and they have more resources and pro's available than the
> majority of sites on the web.

Precisely. That's probably a very good example of "I don't care".

How can we force/teach others to write proper code if not even Google does  

Now, the problems, I'd say, go even more further:

a) We also have teachers at universities who, by an (un)fortunate twist of  
destiny, end up teaching students about HTML, PHP and what-not.

I have several real-life examples of students who asked me about HTML,  
because they wanted to learn for their courses. I, of course, took the  
semantic-view approach, but teachers *require* students to write poor code.

b) Some web developers right now are soo stubborn that they still use  
tables for layouts. They are stubborn because they have arguments like  
"thousands of dollars are being made in this industry by expert web  
developers/designers who still write table-based layouts. You go write  
your semantic code, I'll make some money". The point is, such people  
consider we live in a world where money doesn't matter, and we like to  
write semantic code just for the beauty of doing so. They consider us  

Regarding the initial question about WYSIWYM editors, I don't believe HTML  
is to be blamed because editors generate poor markup. Editors need to be  
blamed, and their users. Actually, their users are the ones who carry the  
most of the blame. Allow me to explain why, based on some real examples. I  
wrote some PHP script which tries to clean lots of usual ugly markup. I  
use htmltidy, but I didn't like htmltidy tries too much to keep the  
original intact. So, my PHP script uses (*cough*) regex and PHP 5 DOM to  
clean the following (often encountered problems):

1. Users don't even know how to use tables.

Given the following example:

<p>blah 1</p>
blah 2<br />
blah 3</td>
<p>boom 1</p>
boom 2<br />
boom 3</td>

My script generates:

<td>blah 1</td>
<td>boom 1</td>
<td>blah 2</td>
<td>boom 2</td>
<td>blah 3</td>
<td>boom 3</td>

Isn't the initial table *seriously* broken? Can you blame Word for that?  
When I make a table in Word I don't do *that*. The user is to be blamed.

Must add that the I provide here as the initial table is actually quite  
clean, what I had to clean where tons of <font>, <b>, <dd> tags within the  
same table. Nonetheless, the output, is real output. (My script simply  
ignores all fonts, colors, etc)

You need artificial intelligence within the editor to warn the user when  
one does such a broken table.

2. Users like to insert lots of <br>s. Should editors disallow line breaks?

We have <br>s between paragraphs, tables, lists, at the start/end of  
paragraphs <p><br>The time has come<br></p>, etc. We have <br>s everywhere.

3. Users don't even know what lists are.

It's quite often that I have to clean such markup:

Just to generate a single <ul>, with all the list items.

In such cases, the editor can do some clean up, but you can never be sure.  
The editor could provide a warning "should the two lists be merged?"

3a. Another example of "users don't even know what lists are".

Users simply don't allow the editors to automatically create semantic  
lists based on the content. As is known, Microsoft Word, OpenOffice Writer  
automatically generate (un)ordered lists for input like:

a) you
b) me
c) and the Web

However, almost every Word document I have to put on the Web contains the  
"a) you<br>b) me<br>c) and the Web". Was that a failure on the editor  
side? No, it's the user who had no clue.

4. Another general problem of markup code generated from Word documents is  
evidenced by the following code:

<p>Opera 9 comes loaded with </p> <p>the tools to keep you productive and  

As one can notice, the two paragraphs should be merged into one.

Again, you need AI and/or draconian restrictions in an editor to be able  
to disallow such human errors.

5. Headings are a complete mess. *If* they are used, they are chosen based  
on font size.

All of this only scratches the surface.

ROBO Design - We bring you the future
Received on Sunday, 18 March 2007 12:31:30 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:21:34 UTC