- From: Tantek Çelik <tantek@cs.stanford.edu>
- Date: Mon, 03 Apr 2000 15:24:12 -0700
- To: Jonny Axelsson <jonny@metastasis.net>, www-html@w3.org
From: Jonny Axelsson <jonny@metastasis.net> Date: Mon, Apr 3, 2000, 1:55 PM > At 20:05 03.04.00 +0200, Jan Roland Eriksson wrote: >>Ok, I have read your archived post, and at the risk of "beating an >>already dead animal" here, I have a different view... I tend to agree with Jan as well. Here are some of my tenets (working assumptions): [1]. If something is described as "typographic" or "typographical", it is likely to be presentational, rather than semantic or structural. [2]. Typical word processors today generate text with presentational styling, rather than valid structural markup. [3]. HTML4 (even when including the deprecated bits) is sorely lacking in ability to represent typical word processor documents. [4]. CSS1 provides most (but certainly not all) of what typical word processor documents need. [5]. CSS2 provides additional presentational features, but even then, is missing common word processor features such as columns. [6]. CSS3 is expected to provide a great many more of such features. And I'll use these statements in my arguments. [A]. It has been clearly established by W3C Recommendations that B/"bold" a. should not be structural (discouraged in favor of style sheets) http://www.w3.org/TR/REC-html40/present/graphics.html#h-15.2.1 b. is presentational (CSS font-weight property) http://www.w3.org/TR/REC-CSS1#font-weight [B]. It has been clearly established by W3C Recommendations that I/"italic" a. should not be structural (discouraged in favor of style sheets) http://www.w3.org/TR/REC-html40/present/graphics.html#h-15.2.1 b. is presentational (CSS font-style property) http://www.w3.org/TR/REC-CSS1#font-style [C]. It has been clearly established by W3C Recommendations that U/"underline" a. should not be structural (deprecated in HTML4) http://www.w3.org/TR/REC-html40/present/graphics.html#h-15.2.1 b. is presentational http://www.w3.org/TR/REC-CSS1#text-decoration [D]. Syntax and Semantic are two different things. Just because two tags have the same syntax doesn't mean they have the same semantic, and the semantic is typically what is more important. [E]. Automatic translation of presentational documents (such as typical word processor documents) to/from XML documents is best done using inline styles on the spans of text that are styled. > These are some basic tenents of mine: > 1. There are relatively clear typographical rules for when to use I[TALIC] > (in languages using italic) > 2. There are fewer and more inconsistent typographical rules for bold Typographic rules apply to presentation only [1], and italic and bold are recommended by the W3C to be presentational [AB]. > 3. Underline is primarily "poor man's italic" (from the age of the > typewriter), but is also used for special effects (like hypertext) Underline is recommended by W3C to be presentational [C] > 4. STRONG is made by false analogy with B[OLD] > 5. Not every text document in the future will (necessarily) be XML, > certainly most current documents are not XML Most word processors today generate presentational text [2]. > I'll use these statements too in my arguments: > A. Syntactically there is no difference between B, EM, Q, ACRONYM, SUB. > They have the same content model. Semantics are what matter [D]. > B. It is important to discern between representation and presentation. EM > /represent/ an emphasis, it might be /presented/ using an italic font, or > by having "/" on each side of the content. Agreed. > C. Non-HTML documents are semi-structured (as are HTML/XML documents). Semi-structured might as well mean unstructured. This semi-structure is typically ascertained by white space and styling, which can only be said to be presentational, and certainly not necessarily structural [2]. > D. People are inconsistent coders. No matter how structured XML becomes, > you can't avoid this. Agreed. But it is much harder to code "tag soup" when your code must be well formed. > E. Automated translations to/from XML is desirable, and so is minimization > of information loss in the process. Agreed. But XML (or HTML) by itself neither contains the semantics (if any) nor the presentation necessary to represent typical word processor documents being translated [23] and it requires style in order to minimize the loss of presentation [456E]. >>Humans have 5 senses, but so far I have not seen any technical devices >>that would let me "smell" or "taste" my way through a www document, so >>that leaves us with three than can be (and are) used. > >>Frankly I don't know exactly what an "italic voice" or a "bold voice" >>sounds like, but I do know about an "emphasized voice" as well as a >>"strong voice". > > Catering for the senses is a presentational issue [AB]. True. Thus if an element only "works" or "makes sense" via one particular sense (sight), then it is dependent on the senses and should be presentational, not structural. This is the point Jan was making. "I" and "B" are dependent on a particular sense (sight), and therefore, are presentational. > It is exactly as > easy to present I as EM in any medium [A]. False. "italic" being "typographic" and therefore "presentational" [1] has no semantic outside the visual medium, therefore has no sensible or obvious presentation outside the visual medium, except to masquerade as "emphasis" in those other media. So why not skip that extra step and just call it "em"? > But does EM represent the same > as I and STRONG the same as B, ie. are they only "politically correct" > aliases? I would say no, and if they were, one of these pairs should be > removed from the standard as redundant. If not removed, they (B & I) were explicitly distinguished, and discouraged, [AaBa] while EM and STRONG were promoted as structural: http://www.w3.org/TR/REC-html40/struct/text.html#h-9.2.1 > I is used to represent a half-dozen > meanings [1], Typographical typically means presentational [1]. > one of which is emphasis. This is backwards. "italic" is one way of styling emphasis. > Another one (in Norwegian) is that > ship names should be in italic. So if you see "M/S Titanic" in italic, you > can't tell if that is because it is emphasised or just because it is a > ship. Precisely because most word processor documents are purely presentational text - you can't tell the meaning behind the presentation [2]. > For this reason you might want to skip I all together for this lack > of precision, but an better approach is to use EM when appropriate, I > otherwise. A better approach is to avoid presentational media-dependent tags, and to add new semantic tags instead, e.g. use <shiptitle> in your DTD for the above example, and then style them as appropriate for the audience, e.g. shiptitle:lang(no) { font-style:italic; } ... <shiptitle>M/S Titanic</shiptitle> > Even the catch-all is useful, and often at the limit of what > authors can handle (if they don't understand when to use italic, they won't > understand how to use any other markup) [1CD]. But where does it end? Do we replicate all presentational styling as markup? Do you propose the FONT tag mess all over again? >>I just feel that in markup we would all benefit from using element names >>that makes sense for more than just those who can read things from paper >>or a VDU. > > This I agree with. "Bold" and "Italic" is unfortunate when the presentation > isn't bold or italic (as in aural style sheets, or when that isn't the > preferred/possible presentation, most cell phones can't show italic). Then > again (this is a cop-out) the elements aren't "BOLD" and "ITALIC", they are > "B" and "I", and anyway that they are called bold and italic doesn't make > it harder to use in non-visual context. But it just continues to confuse authors who have been told that they should be trying to separate their structure and style if at all possible (though I do still believe that should the author want to style an element inline using the style attribute they should be able to - at least that still groups all the styling between quotes into a single attribute, rather than a sprinkle of presentational tags throughout the document.) >>And relying on stylesheets (as in 'EM EM {...}' to replace STRONG) is >>not the way to go. Stylesheets are _optional_ and a correct and >>understandable presentation shall be possible without them. > > Apart from [B], this is an opportunity to explain [4]. First, on the > presentational side, stylesheets (CSS, XSL, whatever) are the /only/ way to > present an XML document. As for EM EM, in ordinary written text, italic is > a toggle switch, italic inside italic (<i>...<i>text</i>...</i>) is > presented as normal, you cannot present doubly italicized text as doubly > italicized, but you can represent it that way. I would personally represent > EM EM the same way as EM in a visual (and probably any other) browser, but > representionally there is a difference. But a presentational short-coming > is no reason to add a representional kluge (STRONG). A presentational kluge <B> is worse than a so-called representational kluge <STRONG> [A]. >>Those who are thinking of publishing "well formed only" xml docs on the >>www may want to contemplate a bit about that, before they go ahead. >>There's no guarantee that every user agent out there will be stylesheet >>capable. I know at least one right now that is not, it's called >>AltaVista IMMIC :) > > You can do dangerous things with CSS, like thoughtless use of the content > property or positioning. But in my opinion EM EM is cleaner coding than > STRONG (then, in my opinion, EM is cleaner than EM EM, there is really no > need for either EM EM or STRONG). Yes, and authors should only ever use a single exclamation point (!), but there are certainly plenty of examples of people using double exclamation points (!!) or more. The reality is that there are more than just two levels of "em"phasis (none or some), and allowing EM EM acknowledges that. >>Basically, on the www, the "meaning" of element content in a document >>shall not be carried in a "required" stylesheet presentation suggestion, >>it's as simple as that. Agreed. I think this is a very important point. > Finally there is HTML/XML in a greater context. I have done many > translations to HTML and a few translation from HTML. You cannot map from > word processor italic to EM, but you can map to I, and an italic -> I -> > italic translation is non-lossy. EM -> italic is lossy, but automatic, > while italic -> EM is manual (and time consuming). [5CDE] HTML4 *by itself* is very poor at representing even what simple ten-year old word processors can do [3]. Word processors are used today in a presentational sense [2], so you should use CSS1 [4] and CSS2 [5] and experiment with CSS3 [6] in order to translate documents to/from HTML/XML with a minimum of loss [E]. Tantek ---------------------------------------------------------------------------- got HTML 4.0 + CSS 1.0 + PNG 1.0 ? http://www.microsoft.com/mac/ie/
Received on Monday, 3 April 2000 18:27:43 UTC