[whatwg] Ghosts from the past and the semantic Web (somewhat related to the RDFa discussions)

This message is quite related to the whole RDFa discussion going on,
but not to any specific message, so it would be confusing to reply
directly to one of such messages.

First of all, HTML is about structure. I want to make this clear
enough from the beginning because trying to broaden the scope of the
language would only cause it to become both unable of representing
structure, and unable to represent whatever else it is tried to
represent. As long as HTML is kept as a structuring language, HTML
will be good at structuring.
Semantics, despite it might be quite related to structure, is not
structure. Presentation is also quite related to structure, after all;
and it was thought that it would make sense to integrate it into the
language. But then we saw the consequences (HTML 3.2), and then it was
known that presentation had to go out of the language. Let us not make
the same mistakes again.

Don't get me wrong, there is a need for semantics in the Web. Things
like Yahoo OpenSearch, Google Answers, the size of the Microformats
community, and the fact that comments in HTML have been used to
express some semantics not supported by other tools (the Creative
Commons old approach), are all proofs that we need, indeed, a
mechanism to deal with the semantics of webpages.
We have, however, some experience from the past: when the need for
control of presentation arose, some ways to deal with it where
considered: presentational markup, CSS, and, later, XSL.
Presentational markup had serious issues: it stripped off HTML of its
structural nature; and it didn't handle the task well enough.
CSS seems to have worked nicely: it moves the presentation away from
the markup (whether it be in external files or embeeded into an
isolated <style> element), it uses a relatively simple sintax, and
then there are some hooks to relate each part of the markup with its
corresponding presentation information.
XSL, while made for XML rather than HTML, is an example of a tool for
the similar task (styling and presentation), but using it for HTML
would be overkill.

I would like to encourage this community to learn from what it has
already been done in the past, check what worked, and see why it
worked; then apply it to the problem at hand. If for presentation CSS
worked (and I really think it did; if somebody disagrees I invite you
to share your opinion), then let's see what made it work:
First of all, and essentially, CSS was independent to HTML, although
they were to be used together. I hope it is already clear by now that
we need to deal with semantics from outside of HTML. RDF is an example
of a mechanism that is independent to HTML.
Next, CSS had a simple syntax, despite the size of its vocabulary:
once you understand the "selector { property: value; }", you
understand most of CSS syntax. The RDF's XML format is quite verbose
and is not a good example of a simple syntax. But RDFa comes to the
rescue, providing an approach to simplify the syntax.
Last, but not least, CSS was usable with HTML because there where were
hooks between the two: the selector's semantics are based in HTML's
structure (and, by extension, any other markup language). CSS was,
indeed, intended to represent the presentation of markup documents.
RDFa provides some hook; but there is a gotcha: RDFa is not intended
to represent the semantics of a web document; but to embeed those
semantics within the document. RDF just represents (semantic)
relationships between concepts; and RDFa puts that representation
inside the document.

Compared to presentation, RDFa is just about adding two or three
properties, compared to the bunch of new presentational elements
HTML3.2 added, so it might work; but I don't think it is a good idea
to intermix the semantics inside the HTML pages.
On one of the arguments about keeping the semantics within the
content, I'd say that the example
<span about="#jane" instanceof="foaf:Person" property="foaf:name">Jane</span >
<span about="#jane" property="foaf:loves" resource="#mac">hates</span >
<span about="#mac" instanceof="foaf:Person" property="foaf:name">Mac</span >
would be as silly as having something like
<span style="color: #FF00FF">This text is green.</span>.
It is not the task of a tool or language to be fool-proof: it is task
of the user to not be fool. The same way someone tests the pages in
browsers to check that they are shown as expected, they should also be
tested within the appropriate tools (any kind of semantics-aware UA)
to ensure that they convey the expected semantics; and this applies
whether the semantics information is stored (ie: embeeded in document
vs external referenced resource).

In summary, I think RDFa might work, and it wouldn't be a too bad
solution, but I don't think it is the best approach either.

Regards,
Eduard Pascual
Software and Web developer.

Received on Wednesday, 27 August 2008 10:42:06 UTC