- From: <JOrendorff@ixl.com>
- Date: Tue, 11 Apr 2000 18:59:49 -0400
- To: www-html@w3.org
This is very long; I apologize. It's really bugging me though. One thing I think about a lot (maybe I'm the only one) is how old documents should map into the new XHTML world. In many cases, print documents have typographical conventions that reflect some underlying semantic order. But there is no HTML tag for most of this stuff. This "semantic order" often has very limited scope: it only applies to a small range of documents, perhaps even a single document. Examples: - In a gossip column, names of celebrities are bold (just like TimBL's name on the W3C site's front page <wink/>) - Names of genera and species often appear in italics. - In a book, each chapter starts with a quote, which is offset from the text. - On a bus schedule, 7:42 means 7:42 A.M. if it is in plain type, 7:42 P.M. if it is in bold type. - In "The Elements of Style", there are often side-by-side comparisons of good and bad writing. Computer-oriented documents are especially enthusiastic about inventing semantic and typological conventions for a very narrow range of content. - In many RFCs, the words "MUST" and "REQUIRED" (and others), when used with a certain very specific meaning, appear in allcaps. This convention is rare elsewhere (some W3C specs have used it.) - A document containing a grammar often distinguishes typographically between "basic" rules, like the terminals, and more complex rules. - A document that talks about XML might make a presentational distinction between elements and attributes. Suppose I'm translating a non-HTML document (like any of the examples above) to XHTML. My goals are: - Don't lose anything interesting in the translation. That is, the document should not be confusing to the reader where the original was clear. If it's all rendered according to user preferences, great, but if the user doesn't have any preferences it should look roughly like the original print. Example: chapter quotes should still be obviously chapter quotes and not look like (or sound like) part of the body of the text. - Expand the document's audience. I want to provide "nice", communicative semantic markup for anyone who's using the document for anything else besides print or screen display. - Save time. I won't put much effort into tagging and maintaining the document. Stop a moment. Are these reasonable goals? I feel they are good mid-range goals. (Long-term, I want better ways to do all this.) I think some HTML gurus don't like <div> and <span> and the class= attribute, because even when they are used with semantic intent (e.g. span class="celebrity"), they have no communicative value. No one else knows the meaning of my special values for class=. H&kon Wium Lie is, I think, of the opinion that XML should be used primarily for communication. A client should have a built-in understanding of what <p> *means*. An author should not invent and use elements (<mytags:chapter-quote>) that the intended clients don't really understand on a semantic level. Even if I can, with a stylesheet, "teach" the client how to present this invented element, I haven't done anything much of value, compared to what XML can offer. I can see that point of view. But I still need to meet goal #1 and get all my content live by next Tuesday. So what do you recommend? Suppose I'm a doctor talking about blood chemistry. I have ideas to express, and HTML doesn't cover the whole range. On paper or in a word processor, I can use italics, some chemical notation, a few tables, and maybe a drawing or two to express my ideas. People will understand. Now: how do I do this with HTML? There are a few possibilities that I think are already being considered. - Some stuff is good enough and distinct enough from XHTML that it would make a good XML-based standard on its own. Examples: annotation, change tracking. - Some stuff is good enough and well enough in line with XHTML that it could be included in future versions of the XHTML spec. Examples: formally citing a work (<cite> sucks), or indicating document structure (<appendix>, <preface> ...). - Some stuff is widely useful enough that it deserves an XHTML module, perhaps apart from the XHTML standard proper but designed to plug in to XHTML. (Examples might include <genus>, <species>, <codeblock>, etc.) But some stuff is just document-specific no matter how you look at it, and a lot of stuff that falls into one of the above categories just isn't standardized yet. How can I deal with all this? - I can invent classes and use the class= attribute, and attach a stylesheet. - I can invent XML elements and attach a stylesheet. - In some cases, HTML has something vaguely related. I can just use that and hope the appearance is good enough. (<code> for a BNF production.) - I can use presentational markup to approximate how the ideas looked in print. ("their latest album, <i>Bludgeoned with a Frozen Haddock</i>"; "W3C Director <strong>Tim Berners-Lee</strong>" - I know <strong> isn't strictly presentational, but can you imagine how that must sound when a voice browser reads it?) - I can leave off the markup: if it isn't supported by HTML, it can't be that important. Perhaps another option would be a new standard, by which I could define a new tag and teach the client a little bit about what it means (quite apart from how it should be styled). This reminds me of Architectural Forms a bit. A lot of effort has gone into making HTML more cleanly extensible. But the question remains: how and when and why should I go about actually extending it? -- Jason Orendorff
Received on Tuesday, 11 April 2000 19:00:47 UTC