Re: XHTML2: Proposal for total separation of semantics from structure

Hello again

By "structure" I mean, mostly, the separation of a whole into parts.
Which means, yes, that I´m talking about a whole document, which
consist on parts, wich might have subparts, etc, presumably with some
actual content at the last level.

As far as "structure" goes, this means simply an enumeration of the
parts, in such a way as to form e hierarchy. It does not say anything
about what each of those parts might actually mean.
It does not say if it is a paragraph, a piece of code, a lyric, the
abstract of a scientific paper, a movie, a picture, or whatever else.
It is simply a hierarchy. A tree.
"What each part means" is the semantics.

In short, we could have a document basically composed of just generic
containers (think <section>). Each of these containers would serve as
hooks for two basic attributes, one for semantics, one for
presentation (think style="" and role="" ).
Style, we all know. Role would be analogous, but with regards to semantics.

The definitions of what those particular pieces of actual content
really mean, could be done, by some means, in some separate sheet if
necessary.

Btw, conceivably the same could be done for behaviours (think
handler="" , and yes, yet another separation).

Default values for common, generic meanings con be already known by
user agents, extra ones could be defined by content generators via
some small ad hoc language or mechanism.



The reasons for separation would, IMO, run analogous to al least most
of the reasons behind the separation of content from presentation, a
la CSS.

For starters, no hardwired things into the language.

But perhaps the most important aspect, would be: are we paving the way
for the vision of the semantic web?
Are we even thinking about people like the ones who attempt to develop
things like topic maps?
What about semantically aware search engines?

The per-capita income of Kenya for 1988 is probably there, somewhere in the net.
Can you get it in just one search?

A search engine today would probably get a lot a of garbage results.
And each year more and more content keeps pouring in, probably making
the problem worse.

Even if you got more or less decent results....could you (or even a
search engine, programatically) extract just that microcontent? Or
would I need to load and read whole pages looking for the relevant two
lines in each?

If microcontent is not programatically extractable, furthermore, then
I'd have to do some screen scrapping to be able to reuse that content.

Shouldn't we be providing ways to hook extensible meaning at all
levels (including the elemental one), in order to facilitate such a
thing?

Are we providing extensible means to mark up microcontent semantically?

XHTML is XML.
XML permits custom made tags.
Those are currently ignored, rendering-wise.
What about semantically-wise?
Are any search engines considering such things?
How should I code my little bit of data, if so?
Wouldn't it be better to separate all that?

<code>?
How about <javascriptcode>?
How about <javascriptcodeinthecontextofxxxcms>?
How about <javascriptcodeinthecontextofxxxcmsformanipulatingtemplates>?
Are we sure we want to hardwire things?
To add semantic functionality to the web probably requires whole
lenguages, with concepts as "is-part-of", inheritance, and even more
complex relationships. And being such a vast thing, likely requires
extensbility inherently.
If you ask me, I'd separate. In advance.

As you see, is not just a half-academic little problem anymore.
We are talking about hard functionality. Massive, too. No little
academic exquisite talk whatsoever.

All *that* discussion should be probably be carried elsewhere.
But if something is required at this level, at least hooks, I believe
we should provide them. And the time is now.

Sorry for the lenght, please do see my former, thread-kickstarting
post, and consider the restraint.
I do believe the subject is important.

Fernando Franco

Received on Thursday, 25 August 2005 10:45:12 UTC