RE: storing info in XSL-FO: new issue? [was: Draft TAG Finding:...]

Hi Eliotte,

Eliotte said:
As it happens, when I read this message I had the Mercury News open 
in a browser window, so I looked at how they expressed headlines in 

class="headline">Pizza heaven</a>

..... other examples.....

This is even easier to reproduce in XSL-FO:

<fo:inline font-face="Times New Roman,Times,Serif" font-size="120%" 
font-weight="bold">Baseball Players' Union Sets Strike Date for Aug. 

Didier replies:
What this is demonstrating is that HTML documents out there, in the real
world, are simply rendering documents and they provide very little
semantics information. I guess this is on purpose since these content
provider want to preserve their copyright. The harder they make their
content hard to be processed, the more they feel protected from free
riders. Simple business common sense and as you know business as
practiced today is not altruistic ;-)

 Back in 1995, people started to use tables and other HTML features as
layout instructions. This behavior is probably induced by the implicit
visual rendering model the browsers possess. However, it nonetheless
possible to state that a header is specified with a <H1> element and use
CSS to attach to it a visual rendition provides a property set. This
practice would preserve some semantics and would separate content from
presentation (at least in parts). But, as you demonstrated in your
examples, web designers show incredible creativity in their usage of
HTML elements used mainly as visual rendering objects. Nonetheless, HTML
per se is not explicitly specified as a rendering language. SVG is,
VoiceXML is, etc...

As we say that the web is based on an underlying architecture (i.e.
REST) even if a lot of people are not seeing the same reality nor are
designing their sites based on these principles, we can also say that
HTML is not a rendering language and that, if is used as such, it is
because of the behavior of certain HTML interpreters named browsers.
Other agents like, for instance, classification engines would prefer the
document to contain more semantic information. The visual rendition
characteristic are not specified in the HTML specs, they are part of
certain concrete HTML interpreter, user agent, browsers. Its an
interpretation of HTML, not a usage based on the specs.

Eliotte said:
XSL-FO contains all the aural properties of CSS. It is no more 
limited to visual presentation than HTML is (which is to say, in 
practice, it's quite tied to visual layout). I understand the 
theoretical point that HTML does not have any official layout model, 
unlike XSL-FO and SVG. However, the implicit layout model enforced by 
Web browsers is so strong that it renders the point moot. HTML is a 
layout language, a less powerful one than XSL-FO to be sure, but 
still a layout language. DocBook it is not.

Didier replies:
Per usage yes, per design no. So, from an anthropological or social
point of view, you are right, a vast majority people are using HTML as a
rendering language. Was that intended to be that? It is not explicitly
stated in the specs so we can reasonably infer that it wasn't. You also
have to take into consideration that a tiny minority is using HTML as
document semantics, maybe limited according to some judgments but it is
still a valid document model with paragraphs, headers, etc... Obviously
a lot of entities not related to content have been added but you can
stick to the basic constructs and DTDs exists to help you do so. Maybe
you should speak of HTML not as a single object but more as a language
having several different dialects. It all depends on the dialect you are
referring to.

Eliotte said:
In 2002 anyone who thinks an H1 element really means anything other 
than "Make this a big, bold, block level element" is kidding 
themselves. The L in HTML stands for "Language". HTML evolves as all 
languages do. The meaning of its words is defined by its speakers. 
HTML has escaped the ivory tower of semantics, and been vulgarized as 
successful languages always are. The prescriptions of the W3C have 
about as much affect on HTML as the prescriptions of the Académie 
Française have on French (that is, little to none).

Didier replies:
>From the social and anthropological point of views you are totally
right, especially if you are referring to the main stream web content
designers. For them an HTML document is simply a document's visual (and
sometimes aural) layout. This also gives us a good clue of the reasons
why we do not see yet a semantic web ;-)

Didier PH Martin

Received on Monday, 19 August 2002 07:51:06 UTC