HTML, XML and "Idioms"

On Mon, 31 Jan 2000, Dan Connolly wrote:

> 	http://www.w3.org/TR/1998/NOTE-xh-19980511
> 
> In that meeting, we discussed a variety of hacks and kludges for
> dealing with XML stuff inside tag-soup-HTML; e.g.: ['blocks',
> 'sprinkles', and 'crud']

The 'Background' section talks about new "idioms" and their effects on
deployed software (i.e. forwards compatibility):

: 1. The idiom is ignored altogether. [...]
: 2. The enhanced functionality of the new idiom is ignored, but the
:    content is otherwise handled sensibly. [...]
: 3. The idiom is disruptive in deployed software. [...]

It's interesting to note that in the General Architecture[1], among
the three attributes that all element types can have, one is for a
quality called 'opacity' (with two values, 'transpar' and 'opaque'),
to specify its essential hierarchic role.

[1] http://www.ornl.gov/sgml/wg8/document/n1920/html/clause-A.5.2.html

The basic problem with new "idioms" in HTML has been the general lack
of determinate opacity.  Instead, transparency (i.e. the element type
information is ignored, but the content is not) has been the default
treatment willy-nilly.  As the document goes on to note, in "General
Requirements",

:  * A method in HTML to declare that an unknown tag is significant
:    (versus the default "ignored" case), and whether the tag is empty
:    or not.

The method could have been two "global" attributes, that any new
"idiom" could exhibit (with appropriate defaulting):

  <!ATTLIST  idiom
      html   NAME     #IMPLIED
      -- a fallback element type in the HTML lexicon --
      htign  (htign|empty)  #IMPLIED 
      -- how the content if any should be treated in context --
      > 

E.g. a new "idion" such as <TD html="p"> (with the *defined* defaults
applying to <TABLE> and <TR>) could have been introduced much more
robustly.  It may still not be too late.

However, in "Possible Solutions", there is no mention of an attribute
based method.  There are at least two straw proposals mentioned, based
on new *element types* (CONTAINER+LEAF, "wrappers"), which IMHO fail
to gauge the essential problem, if not actually beg the question!

Also unmentioned is the possibility of using marked sections *inside*
new "idioms" to defeat the default treatment of "exposed" text: if
implementors want to "extend" HTML, they could at least beef up their
support of basic syntax!

> What I took away from that meeting was: if you really want to use
> the DOM and CSS with HTML, 

I've been noticing an increasing tendency to lump DOM and CSS together
as "things to use".  CSS has an arbitration mechanism to reconcile the
needs and preferences of both authors and readers.  DOM does not: it's
an interface only, entirely subservient to a program.  The notion that
DOM is for document authors "to use" perturbs me - just as I'm sure it
has every code-kiddie on the 'net scratching himself in unholy glee,
"Hee hee! I get to run *my* code on *your* machine, hee hee!"  Is this
the future?

> especically HTML with namespace-style extensions,

What about architectures?  The mappings involved are generic.
 
> you'll have to clean up your end tags and use XML.

End tags are just a matter of OMITTAG NO and IMMEDNET.  XML is another
issue entirely, IMHO.  In fact, it seems that XML is being used as a
convenient excuse to "justify" railroading HTML into a different usage
pattern.

But the real fact of the matter is that 'crud' is not causeless.  It
exists only because of software *willing* to countenance it.  The
salutary lesson available at the WDG validator[2] has historical
evidence - if not proof - of the correct causal relation between
software "tolerance" and user habits 

[2] http://www.htmlhelp.com/tools/validator/reasons.html

(e.g. how many people forget those end-quotes today?)

The beguiling idea, that HTML needs XML for new "idioms", has less
merit than a lot of people seem eager to give it, IMHO.



Arjun
-- 
Please do not Cc: followups.  I *am* subscribed to this list:)

Received on Tuesday, 1 February 2000 01:49:25 UTC