- From: John Cowan <cowan@mercury.ccil.org>
- Date: Tue, 4 Sep 2012 22:12:16 -0400
- To: Uche Ogbuji <uche@ogbuji.net>, Larry Rosen <lrosen@rosenlaw.com>
- Cc: public-microxml@w3.org, license-discuss@opensource.org
I received two emails today both referring to HTML, and it seemed to me that they only required a single answer, so I'm taking the unusual step of cross-posting to two unrelated lists. Follow-ups will presumably land on whichever list you are on. On public-microxml, Uche Ogbuji wrote: > I am not so convinced that people will suddenly start using HTML as > their tag lingua franca in MicroXML. If they did, they would more > likely just skip MicroXML altogether and stick to an HTML toolchain. > I think we can have human-readable documents in the vocab of choice in > MicroXML and then have them transformed to or dressed up as HTML at > the edges of the toolchain. That's the predominant approach today. > There is very little use of XHTML, even XHTML5. Data people use XML > assembled from their DBMS and fling it at XSLT. Content people use > richer vocabularies (e.g. DITA, Docbook, etc.), or wizards that do the > same under the bonnet. On license-discuss, Larry Rosen wrote: > [C]onverting to plain text destroys information useful for human > beings to comprehend the license. It is like removing indentation and > line endings from source code. Please don't encourage old-fashioned > ways of representing licenses so they can't be easily read by the > only ones that matter: Human beings. This is part of my existential > battle, including within Apache, to acknowledge that HTML allows for > a richer vocabulary of expression. Quit down-versioning our creative > works. :-) HTML as a format has suffered so dreadfully from its abuse that HTML as a vocabulary has, I believe, been downgraded as well. As Uche says, people with a lot of documents to deal with tend to treat HTML as a pure output. It has become a fundamentally binary format, as uneditable as PDF and as opaque as Word 97 format, and I think that's really unfortunate. This bias is so pervasive that once when I was working on an XML document format, I suggested the reuse of simple HTML element names like p, blockquote, em, strong, etc. on the grounds that they would be familiar to anyone working with the format. This was immediately shot down by the rest of the team, on the grounds that the users would assume the document format was HTML and try to use it as such. However, they were so vehement about it that I think the unexpressed subtext was, "If it looks like HTML, the customers will treat us as HTML monkeys instead of document type designers. We have to make it look different so they'll know it's Real XML." Indeed, I take this opportunity to praise the DITA creators for having the courage to reuse HTML names in their document-oriented standard. Similarly, when I was working at Reuters Health, all our HTML output was in fact XHTML, so when people asked us for an XML format, I urged them to get the HTML and feed it into their XML toolchain. "No, no, that's HTML; we want XML." "It *is* XML, well-formed XML, all of it." "You don't understand. We want XML, *not* HTML." ~~ /me grinds teeth ~~ I think that one of the things MicroXML may be able to provide is a revitalization of HTML the vocabulary as a reasonable choice for the construction and maintenance of straightforward documents. It's really not so bad for writing simple uncomplicated documents like software licenses or W3C standards -- indeed, I wrote the XML Infoset Recommendation entirely in HTML. Of course, I'm the guy who put together the Itsy Bitsy Teeny Weeny Simple Hypertext DTD, so you'd expect me to say that. See http://www.ccil.org/~cowan/ibtwsh6.rnc (or .rng or .dtd). -- There are three kinds of people in the world: John Cowan those who can count, cowan@ccil.org and those who can't.
Received on Wednesday, 5 September 2012 02:12:42 UTC