Simplicity of Concept from Orion Adrian on 2004-02-27 (www-html@w3.org from February 2004)

From: Orion Adrian <oadrian@hotmail.com>
Date: Thu, 26 Feb 2004 19:20:15 -0500
To: www-html@w3.org
Message-ID: <BAY1-F127S7QdVryWQn000160ad@hotmail.com>
Consistency seems to be lacking in the (X)HTML spec is so many places it 
makes it hard to understand and use.  To be truly successful I believe that 
the (X)HTML spec has to place a higher goal on consistency.

I find it amusing that HTML makes use of several elements for things where 
they have no business being used for (like using the dl element for faq) and 
then making elements that have nearly the exact same semantic meaning as 
each other (like h, title and caption).  While I fully understand the desire 
to not produce elements for every type of semantic structure because you'd 
just end up with an unusable list, I don't understand making h, title and 
caption seperate structures.

I've heard the arguments that the h, title and caption elements have 
different semantic meanings, but frankly who cares.  Only the designers.  
They are close enough.  I also believe that in terms of making the language 
more usuable, the benefits from making elements that are very similar the 
same outweighs their slight semantic difference.

And speaking of which, I think it's going to be increasing important for the 
W3C to look at html elements outside of the context of traditional printed 
media.  Why should it matter that books have titles when web authors aren't 
writing books.  Where do chapter headings come into effect on a web that 
rarely uses them.

For all I've heard about HTML being a semantic it really does a poor job of 
recognizing when things are semantically the same.  The title or name of 
something is not part of it's content, it's metadata.  Sometimes metadata 
gets and should be printed, but let's not confuse "The Mating Habits of 
Gerbils" with content.  It's an identifier to make the content easier to 
comprehend and easier to find.  It is metadata.  So are section headings.  
They are simply the names of the blocks of content they are attached to.  An 
abstract is a fancy name for the summary of a work.  It's still a summary.  
Tables and images have captions, but really captions are just names used to 
more easily digest the content itself.  These same objects may have 
summaries too.  Multiple words for something doesn't mean multiple concepts. 
  These are the same semantic concepts.

Perhaps the W3C could run some usability tests.  Take a piece of the spec 
and try different names and structures.  See how designers do.  Use both 
existing authors and people currently unfamiliar with the spec.  Quantify 
the results.

Reduce the concepts used by HTML to the following:
* strucure (section, list, table, etc.)
* metadata (title/name, summary, author, meta, etc.)
* presentational (style)
* pure semantic (abbr, em, etc.)
* relation linkage (link-like)

Then make sure that anything that can be reasonably put into each category 
goes there.  For a few of them I'm going to elaborate on what I feel goes 
where and why it goes there.

Structural tags are mearly a way to structure free-form data.  They don't 
have semantic meaning themselves, but are rather an organizational tool.  
Tables and lists fall into this category as well as text sections.

Metadata includes things like title, summary, author and other 
profile-specific metadata.  This allows for searching and indexing on a 
detail level previously not capable.

There needs to be only one presentational tag - style.  Do not use the link 
tag.  You have a style element that could work like the script tag allowing 
for a src attribute, but doesn't.  This is inconsistent.

Pure Semantic elements like abbr, code elements give additional information 
about an element that is useful for both the user agent and the designers.  
However blockcode and blockquote do not give any additional semantic 
information about their contents that code and quote.  These need to go.  
The type of something is another form of metadata, but it's more fundamental 
than things like title and is needed to give context to the elements below.

Relational Linkage elements like link can allow users to speficy the 
relation that other content has to this content.  I believe it's important 
to remove script and style from its uses.  For one reason it makes HTML 
harder to read and search.  If only style were used and not link then I 
could simply do a search for style elements.  With Link elements I have to 
hope that the use put the rel attribute first.  I could use regular 
expressions, but that is a really poor solution.

Another task is to take elements that are overloaded like link and break 
them apart into more managable pieces.  Common usages of the meta and link 
tags and input tags (now obsolete) should be broken apart with common usages 
getting promoted to their own elements.  Title, summary and author are all 
good examples of elements that could be extracted from meta.  Link has a 
slew of similar elements dealing with thinks like chapter, section, next, 
previous.

Finally link, meta, style, the various structural tags, the semantic tags, 
and the semantic types for text (abbr, code, quote) can all be broken into 
their own specifications.  I think that a lot of little standards that rely 
on each other is a much better long-term approach to the one monolithic 
standard.  They also scale better.

Orion Adrian

_________________________________________________________________
Take off on a romantic weekend or a family adventure to these great U.S. 
locations. http://special.msn.com/local/hotdestinations.armx
Received on Thursday, 26 February 2004 20:00:47 UTC