Re: Semantic Argument (Warning: Long Post) from Ian Hickson on 2004-11-13 (www-archive@w3.org from November 2004)

From: Ian Hickson <ian@hixie.ch>
Date: Sat, 13 Nov 2004 01:21:26 +0000 (UTC)
To: Doug Schepers <doug@schepers.cc>
Cc: www-archive@w3.org
Message-ID: <Pine.LNX.4.61.0411130006020.8631@dhalsim.dreamhost.com>
Since this isn't really about SVG per se, I've not cc'ed www-svg. Feel
free to post this or follow-ups to www-svg if you like.

On Thu, 11 Nov 2004, Doug Schepers wrote:
> 
> Naturally, I understand what I mean by "semantic," and I've defined it 
> several times. In short, as I see it in a Web context, semantic content 
> is that which is marked up consistently with tags and attributes from a 
> defined ontology within a particular domain.
>
> The audience for this content need not be universal. I care very little 
> about, for example, literary criticism or automobile manufacturing, but 
> those fields have their jargon and needs, and they care about it very 
> deeply. (In the case of lit-crit, perhaps a little too deeply).

On the Web, what I refer to as "semantics" is markup which the user's UA 
can interpret to handle the content in ways the author did not primarily 
intend. For example:

 * HTML's <h1> element is semantically rich on the Web because:

    1. It's part of a standard that will be implemented by
       Web browsers,
    2. It has a defined conceptual meaning that is not tied to
       any particular presentation,
    3. It can be handled any number of equally good ways -- an
       outliner could use it as a page header, a voice browser
       could render it aloud, a graphical UA can render it in 
       a large font, a search engine knows to give the element's
       contents more weight.

 * MathML's <cn> element is semantically rich on the Web because:

    1. It's part of a standard that will be implemented by
       Web browsers,
    2. It has a defined conceptual meaning that is not tied to
       any particular presentation,
    3. It can be handled any number of equally good ways -- a
       graphical browser can display it as part of an expression,
       a math calculator can actually compute the expression, a
       voice browser can know to read the number as a number and
       not a series of digits, etc.

 * SVG's <text> element is semantically poor on the Web because:

    2. It has no defined conceptual meaning beyond being a graphics 
       element consisting of text,
    3. Non-graphical UAs can only handle it in one way: a stream
       of text. They have no way of knowing what the context of
       the text is.

 * An element in a custom namespace bound to SVG using sXBL, or a custom 
   element that is styled using CSS, is semantically poor because:

    1. The element will not be natively implemented by any Web browsers,
       it will only be usable in the context of the sXBL binding or the 
       CSS stylesheet,
    2. It has no defined conceptual meaning that is not tied to the 
       binding or stylesheet,
    3. The elements can only be rendered by user agents that implement SVG 
       (for the binding case) or CSS (for the stylesheet case).
  

> | I don't really see that it is appropriate for the Web browser to
> | have built-in support for GIS analysis. I don't doubt that it
> | would be very useful in your domain, but there is a line to be
> | drawn at how much a Web browser needs to support.
> |
> | To give parallels with HTML -- HTML has support for a definition
> | list - <dl>/<dt>/<dd> -- which can let you write, for instance, a
> | glossary. That's great,
> 
> It is? It's better than nothing (maybe), but it's a far cry to call
> it "great".

Ok, "useful" then.


> | but dictionary and encyclopedia authors ask why doesn't HTML also
> | have support for saying that a word is a noun? Or an adjective?
> | Why is there no way to define the syllables of a word? Why is <dd>
> | only one level deep, allowing for multiple definitions but not
> | sub-definitions? Why is there no markup for highlighting the root
> | of the word, the pronounciation of a word, the etymology of a
> | word?
> 
> You claim elsewhere that one of your objections to textflow in SVG
> is that it loses the semantic value available in HTML, and yet here
> you admit that the HTML semantics are inadequate for all but the
> lowest common denominator of certain established print-legacy
> domains.

That's not at all what I said. The semantics afforded by HTML are
suitable for many millions, if not billion, of pages, such as FAQs,
documentation, home pages about people's cats, contact pages, and all
the other things people say to each other regularly.


> | The answer is that HTML -- like SVG! -- should be a generic
> | language, suitable for a wide range of fields, relatively simple
> | to implement. For more specific work, like writing a dictionary,
> | more specific languages should be used, which can then be
> | transformed into HTML when it comes to the final stage of showing
> | it to the user.
>
> But then you lose almost all the semantic value that's available! It
> does very little good to have a rich structure that is then stripped
> of all that structure in order to be presented. This goes directly
> against your stated goal of increasing semantic content.

The semantic value is that everyone else can interpret your content.
If we have a bazillion markup languages, one for each domain, then Web
browsers aren't going to support them all, and the accessibility of
the markup is lost.


> | I can't off hand think of any drawing that used multiline text
> | with automatic word-wrapping where that text would not be better
> | marked up using a semantic markup language. Maps rarely use
> | multiline text in my experience (mostly text on a path),
> | schematics typically have just labels, and when the labels expand
> | into multiline text that text would need rich semantic markup such
> | as HTML <var> elements.
> 
> Which rich semantic markup would that be? How useful is <var> in a
> larger context? Here are most of the logical or semantic tags in
> HMTL: 'h1-6', 'p', 'em', 'strong', 'cite', 'blockquote', 'dfn',
> 'acronym', 'abbr', 'address', 'ins', 'del', 'samp', 'code', 'var',
> 'kbd', 'href', 'alt', 'longdesc', 'title', 'ul', 'ol', 'li', 'dl',
> 'dt', 'dd', 'table', 'tr', 'th', 'td', 'col', 'caption', 'label',
> 'legend', 'sub', 'sup'
> 
> I may be missing some, but not many. You might argue for the
> inclusion of form widgets, but those are so variable in function, as
> I mentioned before, that only the labels, legend, and submit button
> have any sort of reliably conveyable meaning. Of these 40-ish tags,
> many are structural, some are hyperspecialized (4 for computer
> terms), some are near-synonyms ('cite' vs. 'blockquote', 'dfn' vs.
> 'dl'), some are for editing ('ins', 'del'), many are loosely defined
> as far as content ('address', 'h1-6'), and others are misused
> (tables for layout, lists for navigation menus as well as shopping
> items). Add the fact that many are seldom used where they should be
> (how often does a shopping cart really use lists? How many acronyms
> are tagged?), and in a pragmatic sense, you don't have a very
> semantic Web.

It's good enough, though, for most of what people write. It's good
enough to make the text accessible -- to allow it to be analysed for
simple uses (document outlines, e.g.), to be rendered in different
media (visual, speech, braille, etc).


> | You'll notice that I've been saying the same thing about XSL:FO as
> | I have been saying about SVG's proposed multiline text feature. It
> | violates AWWW and WCAG, has poor accessibility, and semantic
> | markup styled with CSS is a much better model which should have
> | been used instead.
> 
> Why is semantic markup styled with CSS better than semantic markup
> styled with SVG? I submit that it isn't.

It wouldn't be, if that's what you actually had. But it isn't. The
sXBL model, for instance, requires an <svg> root element. You can't
take an HTML document and "style it with SVG" the way things stand
now. (That was part of my technical comments, in fact.)

The biggest problem, though, is that you can write content purely in
SVG, without the semantics. That's the problem XSL:FO has too, and the
problem HTML 3.2 had with <font>, <br>, and <table border>. People in
general don't understand semantics, so if you let them write documents
using a presentational language, and if it will work in Web browsers,
they will. We side-stepped the problem with XSL:FO because browsers
never implemented it, but with SVG, browsers are most likely to end up
support it, and so we (the Web standards community) have to make sure
that SVG's design doesn't encourage authors to write documents purely
in SVG without the higher-level semantics ever existing.


> Just because it's not from the W3C doesn't mean it's unknown.

Granted -- ECMAScript is known, for instance.


> | Using a language that was well-known (e.g. one that was a W3C
> | Recommendation, such as XForms) would mean that the content had
> | semantics.
>
> That's the only criteria for semantics? That it's a W3C
> Recommendation? I know some people who have designed ontologies for
> medical records that would take issue with that analysis.

As far as the Web is concerned, at the end of the day, a language has
semantics if it is implemented in Web browsers. DocBook, for example,
as far as the Web is concerned, is pretty meanginless.


> Again, there are many cases (in fact, the overwhelming majority of
> them) in which domain-specific semantics have user bases of many
> thousands of people, and which are very well-defined, but which
> cannot currently be represented in a meaningful way on the Web (or
> on intranets, as it may be). sXBL doesn't add semantics, and nobody
> made that claim. It simply allows them to be represented.

Same as CSS, sure. And in limited environments with only a few
thousand people, or in intranets, that might be fine. But on the Web
you have to deal with over half a billion people.


> | The point is all of those ideas would re-use the existing text
> | layout model from SVG, wouldn't introduce an entire chapter's
> | worth of new features, wouldn't step on CSS's toes, and wouldn't
> | encourage the abuse of SVG for what should be semantic-level
> | (HTML) markup.
> 
> Let's let RDF and other ontological XML dialects do their job of
> providing rich semantics

RDF, at least in its current state, has had basically zero uptake in
the real world. RSS is the closest thing to a success that RDF has
had, and no RSS viewers are implemented in terms of RSS. In fact, most
RSS feeds aren't even well-formed XML. RSS has, in that sense, been as
(un)successful as XHTML1.

In fact, it isn't clear that there is a demand from actual Web authors
for the detailed semantics that RDF can offer. Semantics are important
when they improve communication -- RDF's level of semantics is great
for inference-style data analysis, but it isn't clear that they
actually improve communication.


> and use HTML, CSS, and SVG to do their job of presenting them.

HTML isn't a presentation language -- it doesn't say how things should
be rendered.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Saturday, 13 November 2004 01:21:30 UTC