- From: Benjamin Hawkes-Lewis <bhawkeslewis@googlemail.com>
- Date: Sun, 21 Jan 2007 23:06:32 +0000
Matthew Paul Thomas wrote: > Citation is not something new, and there is no > obvious reason for styling it differently on the Web. Citations designed to work within the constraints of expensive print publishing and to enable manual retrieval from eccentric antiquarians and musty libraries are clearly not optimized for hypertext. So isn't there a prima facie case for evolution here? > Second, as I demonstrated earlier, there is no clear boundary to decide > whether you are actually citing a particular person, or just mentioning > them. Yes, this ambiguity exists because the word "cite" has two shades of meaning. Originally, to "cite" an authority meant to use them as a witness in a legal case, and hence it came to mean: "To quote (a passage, book, or author); gen[erally] with implication of adducing as an authority." But it also came to mean, more vaguely, "To call to mind; make mention of or reference to". (See OED Online, q.v. "cite", if you or your organization have a subscription.) In terms of web functionality, I think HTML needs to provide at least the ability to: 1) Jump directly to a discussed work/authority (or, at worst, directions to a discussed work/authority) from a brief mention or detailed description of said work/authority. 2) Jump directly to the sources of a quotation or statement (or, at worst, directions to/discussion of the sources of a quotation or statement) from the quotation or statement, while still allowing the quotation or statement to contain hyperlinks itself. 3) List works discussed or used as references by a given web document. (Academics need to be able to track who is citing whom.) Function 2 and therefore Function 3 clearly require something additional to <a>. With advanced URI syntaxes comparable to OpenURL, <a> can perhaps cover Function 1. But seeing as nothing will force authors to use such URI syntaxes and as one may not exist that is fit for a given purpose, if we expect HTML documents to make sense when reserialized to deadtree, one also may need to include functionality to cope specifically with the print legacy. For example, web documents tend to give less context for brief mentions then the print equivalent. Bloggers are notorious for poor link text like: "but what about <a href="http://www.example.com">this</a>?" In a print context, "this" would be replaced by the name of a work, and in a formal context usually backed up by a bibliography entry if not a note containing a fuller citation. Such link text also suffers /severely/ from link rot. > And third, there is no benefit for the reader. It doesn't really make > the text any easier to understand; and if the author's name is followed > by a title that is also in italics, it may actually be harder to see > which is the author and which is the work. That's true. > Most likely because it's a transcript. :-) Looking at the Oxford Guide's text again, I misread it and you're entirely right. Sorry for introducing a red herring. > The genius of HTML is that it gets authors to use many elements that > are simultaneously presentational *and* semantic. As far as I can tell, that aspect of HTML's genius is pretty theoretical at present. Right from the start, web designers have been engaged in a perpetual struggle against the default presentation of most semantic HTML elements. In the early days, this struggle took the form of using presentational elements instead (<i>, <b>), using proprietary presentational features, and misusing semantic elements to achieve presentational effects, most famously heading elements for bigger text, <blockquote> for indent, <br> for leading and lists, and <table> for grid layout. While many of these bad practices are still going strong, increased knowledge of and browser support for CSS has increasingly allowed web designers to treat all elements as mere hooks for applying styles of their choice. Miscommunication about the reasons to avoid old-style table layouts has led to the mass replacement of both semantic and presentational elements with divitis, even to the extent that newbies regularly attempt to markup tabular data with <div>. Miscommunication about the reasons to prefer semantic elements to <i> and <b> has led to <em> and <strong> being misapplied to create italic and bold effects. In a page designed today, I'd guess only the following default stylings are very likely to be preserved when semantic elements are used correctly: 1. <p> is block. 2. <blockquote> is indented. 3. <h1> to <h6> are block, bold, and of graduated size. 4. <em> and <strong> are inline and respectively italic and bold. 5. <ol> is block and its list items are numbered. 6. <code> is inline and monospace. Note however that italic, bold, and numbering styles may all need to be replaced by something different in non-European languages. For example, in Japanese, <em>'s italic styling should be replaced by CSS2 box shading: http://alistapart.textdrive.com/articles/worldgrowssmall or better yet by special Asian CSS3 properties: http://www.w3.org/International/questions/qa-css-lang Hebrew doesn't even /have/ an italic. That's not a roaring success if the idea was to match semantics with presentation automatically, although at least heading elements have made it far easier for screen readers to navigate longer documents. Now we could perhaps save this idea if: 1) We make user agents' default styles handle internationalization of HTML elements properly. 2) We modify the idea somewhat and suggest that the genius of HTML when used with CSS is that its element set is typical of those components for which a typical page will need to use style hooks. But even this would be problematic to sustain: where are the <banner>, <navigation>, <product>, <note>, <comment>, and <advert> elements? With its (hypothecated) suggested default styles and broader element set, HTML5 seems to be improving on HTML4 in these respects. And in defence of HTML generally, I guess many of these problems are the result of misconceived, buggy, and broken tools, not symptomatic of the design of the language itself. > Useful to readers *and* computers. Until the robots take over, the purpose of markup useful to computers is that computers can make it useful to readers. For example, citation data can be extracted to reading lists, citations can be reformatted to suit the reader's preferences, quotations can be checked against their sources, and so forth. -- Benjamin Hawkes-Lewis
Received on Sunday, 21 January 2007 15:06:32 UTC