HTML classes of products Re: Support Existing Content from Karl Dubost on 2007-05-01 (public-html@w3.org from May 2007)

From: Karl Dubost <karl@w3.org>
Date: Tue, 1 May 2007 11:19:07 +0900
To: tina@greytower.net
Cc: HTML WG <public-html@w3.org>
Message-Id: <745AACD1-C595-43ED-A07C-574A3C6AE99D@w3.org>
Hi Tina, and others,

Le 1 mai 2007 à 09:34, Tina Holmboe a écrit :
>> But "valid" is exactly what someone chooses it to mean. Why is it
>> that omitting the </P> close tag is valid HTML, but omitting </PRE>
>
>   Yes. And, imnsho, the direction the HTML WG seems to be taking
>   includes making the /wrong/ decision about what is, and isn't valid.
>
>   HTML as a presentation language is not practical, for instance. We,
>   that is topic experts, do show up here to engage in the standards
>   process and you might show us the courtesy of listening.

It is what we can call a lively debate :) I have discussed this
morning (japanese time) on IRC with dave hyatt and maciej and a few
others. I'm glad that you have joined the list, as you said, the HTML
WG is defined by its constituency. With diverse opinions from
different sources, it will enrich us, if, and only if, we try to
understand each other. I think sometimes there is a question of
culture and background, and of the point of view.

HTML is not the same beast depending on the way you are looking at
it. (My characterizations might lack of precision. But that would be
cool if someone wants to refine them on the Wiki for example.)

# Web author (hand coding)

 From the point of view of the author, HTML is a set of tags with a
clear defined meaning (ex: 'q') or functional semantics (ex: 'a').
Sometimes, the definition given by the previous specifications,
books, tutorials, leads to misunderstanding and they are not properly
used. They are many categories of HTML hand coders with different
capabilities and knowledge. Some of the authors will see it just as a
support for CSS for example and do not care that much about the
meaning. Some will be very precise and be frustrated by the lack of
defined elements.

# Web author (wysiwyg)

By far this is the most common author on the Web, and basically, they
do not know what HTML is at all. Most of these people use a form
where they put simple text, sometimes enrich with javascript toolbar,
some send html emails, some save their office document as web
document to be loaded by the CMS.

# CMS developer, scripting libraries.

HTML is a language that in the best case have some rules of nesting
for tags and help to put content on a web page. It is something to
put bits of content coming from a database on the Web. It is very
rare that the semantics is understood or even care of. It is very
rare to have CMS which puts a quality process in the publishing step.
Their conception is more html fragment than document. (ironically it
speaks in favor of namespaces, but that is another debate.)

# Web authoring Wysiwyg tool

HTML is a very difficult thing to implement. The specification in the
past have not been defined for them. Just they have to produce a
document which respects the syntactic rules of the language. But not
a lot of guidance is given to implement the language at the UI level.
We have a tendency to define, right now, a lot more how to render and
not that much how to create.

# Web Visual Browser

 From the point of view of a Web visual browser (and then its
developers), it is a blurb of tags, most of the time not written very
well. They /have to parse/ to give something mostly usable by a
random person on the Web.

# Assist Browser

They see HTML as a powerful language to give easily access to content
for people who had no access to it in the past. Giving access to a
paper book to someone who is blind has a high cost, it becomes easy
on the Web. Though it is also difficult to implement a useful tool
because not many Web authors and CMS care for accessibility. So
people themselves using these browsers fill the gap when they can by
using their own skills and intelligence.

# Web search

Strange world. It is not a uniform world. They are at least two big
sub-classes:

## Web search services (Yahoo!, MS Live, Google and Quaero(maybe) )

For those, they need to parse the web content which is not only html
and which is mostly a few tags and a lot of content. They are
interested by links! and some of the meaningful tags but not that
much.

## Web search engines (ht://Dig, Nutch, etc.)

More skilled and more powerful, they are used on corporate, academic,
personal Web sites. They are crafted to index all kind of metadata
and semantics. HTML is a fully meaningful language. It helps users on
the Web to have a more precise answer within the context of a
corporate site. Initiatives like explicit data (RDFa, microformats),
metadata in head, etc. are very important for them. Some of these
engines work on the Desktop and then are a tool for desktop users
(Spotlight for example.)

# Validators, Conformance checker, Helping tools

HTML is a set of rules and definitions, that helps to define if the
document is in contradiction with these rules. Some of the rules can
be checked easily, can be processed by a machine, some others are a
lot more difficult.

# Other Specifications

HTML is a set of rules and syntactic constraints with a defined
semantics that can be used, be encapsulated in another technology.



-- 
Karl Dubost - http://www.w3.org/People/karl/
W3C Conformance Manager, QA Activity Lead
   QA Weblog - http://www.w3.org/QA/
      *** Be Strict To Be Cool ***
Received on Tuesday, 1 May 2007 02:19:36 UTC