Re: The underappreciated merits of HTML

On Tue, Sep 4, 2012 at 8:12 PM, John Cowan <cowan@mercury.ccil.org> wrote:

> I received two emails today both referring to HTML, and it seemed to me
> that they only required a single answer, so I'm taking the unusual step
> of cross-posting to two unrelated lists.  Follow-ups will presumably
> land on whichever list you are on.
>
> On public-microxml, Uche Ogbuji wrote:
>
> > I am not so convinced that people will suddenly start using HTML as
> > their tag lingua franca in MicroXML.  If they did, they would more
> > likely just skip MicroXML altogether and stick to an HTML toolchain.
> > I think we can have human-readable documents in the vocab of choice in
> > MicroXML and then have them transformed to or dressed up as HTML at
> > the edges of the toolchain.  That's the predominant approach today.
> > There is very little use of XHTML, even XHTML5.  Data people use XML
> > assembled from their DBMS and fling it at XSLT.  Content people use
> > richer vocabularies (e.g. DITA, Docbook, etc.), or wizards that do the
> > same under the bonnet.
>
> On license-discuss, Larry Rosen wrote:
>
> > [C]onverting to plain text destroys information useful for human
> > beings to comprehend the license. It is like removing indentation and
> > line endings from source code. Please don't encourage old-fashioned
> > ways of representing licenses so they can't be easily read by the
> > only ones that matter: Human beings.  This is part of my existential
> > battle, including within Apache, to acknowledge that HTML allows for
> > a richer vocabulary of expression. Quit down-versioning our creative
> > works. :-)
>
> HTML as a format has suffered so dreadfully from its abuse that HTML as a
> vocabulary has, I believe, been downgraded as well.  As Uche says, people
> with a lot of documents to deal with tend to treat HTML as a pure output.
> It has become a fundamentally binary format, as uneditable as PDF and
> as opaque as Word 97 format, and I think that's really unfortunate.
>

It is certainly unfortunate.  But the browser boys have shown their ability
to screw up all sorts pf Good Things, and HTML5 is just their latest bit of
boys-will-be-boys backyard demolition.  OK OK there are a few good things
about HTML5.  A few.


This bias is so pervasive that once when I was working on an XML document
> format, I suggested the reuse of simple HTML element names like p,
> blockquote, em, strong, etc. on the grounds that they would be familiar
> to anyone working with the format.  This was immediately shot down by
> the rest of the team, on the grounds that the users would assume the
> document format was HTML and try to use it as such.
>
> However, they were so vehement about it that I think the unexpressed
> subtext was, "If it looks like HTML, the customers will treat us as
> HTML monkeys instead of document type designers.  We have to make it
> look different so they'll know it's Real XML."  Indeed, I take this
> opportunity to praise the DITA creators for having the courage to reuse
> HTML names in their document-oriented standard.
>
> Similarly, when I was working at Reuters Health, all our HTML output
> was in fact XHTML, so when people asked us for an XML format, I urged
> them to get the HTML and feed it into their XML toolchain.  "No, no,
> that's HTML; we want XML."  "It *is* XML, well-formed XML, all of it."
> "You don't understand.  We want XML, *not* HTML." ~~ /me grinds teeth ~~
>

They'd probably seen all sorts of horrors that purported to be XML (RSS
x.x, anyone?) and had just been punked so often that they had lost all
trust.  Even in you, though they should have known better.



> I think that one of the things MicroXML may be able to provide
> is a revitalization of HTML the vocabulary as a reasonable choice
> for the construction and maintenance of straightforward documents.
> It's really not so bad for writing simple uncomplicated documents like
> software licenses or W3C standards -- indeed, I wrote the XML Infoset
> Recommendation entirely in HTML.
>
> Of course, I'm the guy who put together the Itsy Bitsy Teeny
> Weeny Simple Hypertext DTD, so you'd expect me to say that.
> See http://www.ccil.org/~cowan/ibtwsh6.rnc (or .rng or .dtd).
>

So I would be solidly behind any efforts to encourage people to author
simple documents using HTML vocabulary in MicroXML.  IBTWSHDTD, besides
always having been my favorite name in all the markup world, is something I
think could flourish in a world where people believe in "Micro" things
again.  And of course it would be a good tool to consolidate behind some
small degree of semantic markup (<strong> rather than <b>, and so on).

That still doesn't take me far enough to support "<!DOCTYPE html>" though.
 I think it's one thing to say "hey guys, why not just use the HTML vocab
rather than reinventing that wheel for content that's not too many steps
removed from presentation."  It's quite another to say "Hey put this bit of
cryptic fluff at the top of your documents so that browsers magically
behave themselves when they see it."  The whole DOCTYPE switching behavior
between quirks and standards mode is a hack, and one of the most awful
hacks ever.  It just doesn't feel right to complicate MicroXML to satisfy a
hack.  Gosh, to go further, given the history of HTML and the folks behind
HTML5, who is to say the nature of that hack won't change arbitrarily a
couple of years by now? Hardly a sane mule to which to yoke our cart.

And as Mike S pointed out, it is a major complication, even if the spec
just claims "Oh never mind what that syntactical appendix means, just spell
it exactly as we say."  That was what they tried with the whole "The
namespace is just a string, not a URL."  Well, people looked at it and heck
it looks like a URI, so sorry, how is it not a URI again?  And the result
was 3000-message W3C mailing lists on angels-on-the-head-of-a-pin, and
probably another 2000 on XML-DEV until Rick Jelliffe offered the Treaty of
Wulai, and then we got on the road to RDDL and all that and the mess is
still far, far, far away from being cleared up.  The point is that if it
even looks like a DTDecl, it will ultimately bring in a sizable portion of
the brain-baggage of DTDeclas, whether we like it or not, whatever we may
say in the spec.


-- 
Uche Ogbuji                       http://uche.ogbuji.net
Founding Partner, Zepheira        http://zepheira.com
http://wearekin.org
http://www.thenervousbreakdown.com/author/uogbuji/
http://copia.ogbuji.net
http://www.linkedin.com/in/ucheogbuji
http://twitter.com/uogbuji

Received on Wednesday, 5 September 2012 03:27:21 UTC