Custom markup

This is very long; I apologize.  It's really bugging me though.

One thing I think about a lot (maybe I'm the only one) is
how old documents should map into the new XHTML world.

In many cases, print documents have typographical conventions
that reflect some underlying semantic order.  But there is no
HTML tag for most of this stuff.  This "semantic order" often
has very limited scope:  it only applies to a small range of
documents, perhaps even a single document.

Examples:

 - In a gossip column, names of celebrities are bold (just
   like TimBL's name on the W3C site's front page <wink/>)
 - Names of genera and species often appear in italics.
 - In a book, each chapter starts with a quote, which is
   offset from the text.
 - On a bus schedule, 7:42 means 7:42 A.M. if it is in
   plain type, 7:42 P.M. if it is in bold type.
 - In "The Elements of Style", there are often side-by-side
   comparisons of good and bad writing.

Computer-oriented documents are especially enthusiastic about
inventing semantic and typological conventions for a very
narrow range of content.

 - In many RFCs, the words "MUST" and "REQUIRED" (and others),
   when used with a certain very specific meaning, appear in
   allcaps.  This convention is rare elsewhere (some W3C specs
   have used it.)
 - A document containing a grammar often distinguishes
   typographically between "basic" rules, like the terminals,
   and more complex rules.
 - A document that talks about XML might make a presentational
   distinction between elements and attributes.

Suppose I'm translating a non-HTML document (like any of the
examples above) to XHTML.  My goals are:

 - Don't lose anything interesting in the translation.  That is,
   the document should not be confusing to the reader where the
   original was clear.  If it's all rendered according to user
   preferences, great, but if the user doesn't have any preferences
   it should look roughly like the original print.  Example:
   chapter quotes should still be obviously chapter quotes and
   not look like (or sound like) part of the body of the text.
 - Expand the document's audience.  I want to provide "nice",
   communicative semantic markup for anyone who's using the document
   for anything else besides print or screen display.
 - Save time.  I won't put much effort into tagging and maintaining
   the document.

Stop a moment.  Are these reasonable goals?  I feel they are good
mid-range goals.  (Long-term, I want better ways to do all this.)

I think some HTML gurus don't like <div> and <span> and the class=
attribute, because even when they are used with semantic intent
(e.g. span class="celebrity"), they have no communicative value.
No one else knows the meaning of my special values for class=.

H&kon Wium Lie is, I think, of the opinion that XML should be
used primarily for communication.  A client should have a
built-in understanding of what <p> *means*.  An author should
not invent and use elements (<mytags:chapter-quote>) that the
intended clients don't really understand on a semantic level.
Even if I can, with a stylesheet, "teach" the client how to
present this invented element, I haven't done anything much
of value, compared to what XML can offer.

I can see that point of view.  But I still need to meet
goal #1 and get all my content live by next Tuesday.  So what
do you recommend?

Suppose I'm a doctor talking about blood chemistry.  I have
ideas to express, and HTML doesn't cover the whole range.  On
paper or in a word processor, I can use italics, some chemical
notation, a few tables, and maybe a drawing or two to express
my ideas.  People will understand.  Now: how do I do this with
HTML?

There are a few possibilities that I think are already being
considered.

 - Some stuff is good enough and distinct enough from XHTML that
   it would make a good XML-based standard on its own.  Examples:
   annotation, change tracking.
 - Some stuff is good enough and well enough in line with XHTML
   that it could be included in future versions of the XHTML spec.
   Examples: formally citing a work (<cite> sucks), or indicating
   document structure (<appendix>, <preface> ...).
 - Some stuff is widely useful enough that it deserves an XHTML
   module, perhaps apart from the XHTML standard proper but
   designed to plug in to XHTML.  (Examples might include <genus>,
   <species>, <codeblock>, etc.)

But some stuff is just document-specific no matter how you look at
it, and a lot of stuff that falls into one of the above categories
just isn't standardized yet.  How can I deal with all this?

 - I can invent classes and use the class= attribute, and attach
   a stylesheet.
 - I can invent XML elements and attach a stylesheet.
 - In some cases, HTML has something vaguely related.  I can just
   use that and hope the appearance is good enough.  (<code> for
   a BNF production.)
 - I can use presentational markup to approximate how the ideas
   looked in print.  ("their latest album, <i>Bludgeoned with
   a Frozen Haddock</i>"; "W3C Director <strong>Tim
   Berners-Lee</strong>" - I know <strong> isn't strictly
   presentational, but can you imagine how that must sound when
   a voice browser reads it?)
 - I can leave off the markup: if it isn't supported by HTML, it
   can't be that important.

Perhaps another option would be a new standard, by which I could
define a new tag and teach the client a little bit about what it
means (quite apart from how it should be styled).  This reminds
me of Architectural Forms a bit.

A lot of effort has gone into making HTML more cleanly
extensible.  But the question remains:  how and when and why
should I go about actually extending it?

-- 
Jason Orendorff

Received on Tuesday, 11 April 2000 19:00:47 UTC