- From: Uche Ogbuji <uche@ogbuji.net>
- Date: Tue, 4 Sep 2012 21:26:53 -0600
- To: public-microxml@w3.org
- Message-ID: <CAPJCua02RDBuN8OFbY+U3n=0E9b5AXCQ0WK2rYzaJThBe6MM=A@mail.gmail.com>
On Tue, Sep 4, 2012 at 8:12 PM, John Cowan <cowan@mercury.ccil.org> wrote: > I received two emails today both referring to HTML, and it seemed to me > that they only required a single answer, so I'm taking the unusual step > of cross-posting to two unrelated lists. Follow-ups will presumably > land on whichever list you are on. > > On public-microxml, Uche Ogbuji wrote: > > > I am not so convinced that people will suddenly start using HTML as > > their tag lingua franca in MicroXML. If they did, they would more > > likely just skip MicroXML altogether and stick to an HTML toolchain. > > I think we can have human-readable documents in the vocab of choice in > > MicroXML and then have them transformed to or dressed up as HTML at > > the edges of the toolchain. That's the predominant approach today. > > There is very little use of XHTML, even XHTML5. Data people use XML > > assembled from their DBMS and fling it at XSLT. Content people use > > richer vocabularies (e.g. DITA, Docbook, etc.), or wizards that do the > > same under the bonnet. > > On license-discuss, Larry Rosen wrote: > > > [C]onverting to plain text destroys information useful for human > > beings to comprehend the license. It is like removing indentation and > > line endings from source code. Please don't encourage old-fashioned > > ways of representing licenses so they can't be easily read by the > > only ones that matter: Human beings. This is part of my existential > > battle, including within Apache, to acknowledge that HTML allows for > > a richer vocabulary of expression. Quit down-versioning our creative > > works. :-) > > HTML as a format has suffered so dreadfully from its abuse that HTML as a > vocabulary has, I believe, been downgraded as well. As Uche says, people > with a lot of documents to deal with tend to treat HTML as a pure output. > It has become a fundamentally binary format, as uneditable as PDF and > as opaque as Word 97 format, and I think that's really unfortunate. > It is certainly unfortunate. But the browser boys have shown their ability to screw up all sorts pf Good Things, and HTML5 is just their latest bit of boys-will-be-boys backyard demolition. OK OK there are a few good things about HTML5. A few. This bias is so pervasive that once when I was working on an XML document > format, I suggested the reuse of simple HTML element names like p, > blockquote, em, strong, etc. on the grounds that they would be familiar > to anyone working with the format. This was immediately shot down by > the rest of the team, on the grounds that the users would assume the > document format was HTML and try to use it as such. > > However, they were so vehement about it that I think the unexpressed > subtext was, "If it looks like HTML, the customers will treat us as > HTML monkeys instead of document type designers. We have to make it > look different so they'll know it's Real XML." Indeed, I take this > opportunity to praise the DITA creators for having the courage to reuse > HTML names in their document-oriented standard. > > Similarly, when I was working at Reuters Health, all our HTML output > was in fact XHTML, so when people asked us for an XML format, I urged > them to get the HTML and feed it into their XML toolchain. "No, no, > that's HTML; we want XML." "It *is* XML, well-formed XML, all of it." > "You don't understand. We want XML, *not* HTML." ~~ /me grinds teeth ~~ > They'd probably seen all sorts of horrors that purported to be XML (RSS x.x, anyone?) and had just been punked so often that they had lost all trust. Even in you, though they should have known better. > I think that one of the things MicroXML may be able to provide > is a revitalization of HTML the vocabulary as a reasonable choice > for the construction and maintenance of straightforward documents. > It's really not so bad for writing simple uncomplicated documents like > software licenses or W3C standards -- indeed, I wrote the XML Infoset > Recommendation entirely in HTML. > > Of course, I'm the guy who put together the Itsy Bitsy Teeny > Weeny Simple Hypertext DTD, so you'd expect me to say that. > See http://www.ccil.org/~cowan/ibtwsh6.rnc (or .rng or .dtd). > So I would be solidly behind any efforts to encourage people to author simple documents using HTML vocabulary in MicroXML. IBTWSHDTD, besides always having been my favorite name in all the markup world, is something I think could flourish in a world where people believe in "Micro" things again. And of course it would be a good tool to consolidate behind some small degree of semantic markup (<strong> rather than <b>, and so on). That still doesn't take me far enough to support "<!DOCTYPE html>" though. I think it's one thing to say "hey guys, why not just use the HTML vocab rather than reinventing that wheel for content that's not too many steps removed from presentation." It's quite another to say "Hey put this bit of cryptic fluff at the top of your documents so that browsers magically behave themselves when they see it." The whole DOCTYPE switching behavior between quirks and standards mode is a hack, and one of the most awful hacks ever. It just doesn't feel right to complicate MicroXML to satisfy a hack. Gosh, to go further, given the history of HTML and the folks behind HTML5, who is to say the nature of that hack won't change arbitrarily a couple of years by now? Hardly a sane mule to which to yoke our cart. And as Mike S pointed out, it is a major complication, even if the spec just claims "Oh never mind what that syntactical appendix means, just spell it exactly as we say." That was what they tried with the whole "The namespace is just a string, not a URL." Well, people looked at it and heck it looks like a URI, so sorry, how is it not a URI again? And the result was 3000-message W3C mailing lists on angels-on-the-head-of-a-pin, and probably another 2000 on XML-DEV until Rick Jelliffe offered the Treaty of Wulai, and then we got on the road to RDDL and all that and the mess is still far, far, far away from being cleared up. The point is that if it even looks like a DTDecl, it will ultimately bring in a sizable portion of the brain-baggage of DTDeclas, whether we like it or not, whatever we may say in the spec. -- Uche Ogbuji http://uche.ogbuji.net Founding Partner, Zepheira http://zepheira.com http://wearekin.org http://www.thenervousbreakdown.com/author/uogbuji/ http://copia.ogbuji.net http://www.linkedin.com/in/ucheogbuji http://twitter.com/uogbuji
Received on Wednesday, 5 September 2012 03:27:21 UTC