The war of the worlds: HTML vs. RDF

1. THE WAR OF THE WORLDS

The Semantic Web is based on the concept of being able to express in a
standard format relations between different data and different entities
(including real-world entities) [1]. Today this is mainly based on RDF.

The traditional web focuses on web-pages ranging from interchange of
static documents to dynamic applications [2]. Today this is mainly
based on HTML.

It is clear that there is a missing link between this two worlds,
because the data are, most of the time, included in documents
in ways that are non-structured and non-standardized
or the data is managed by applications in a proprietary/closed manner.

The risk is that the two worlds (semantic web and traditional web)
instead of collaborate will be in competition.
Today we are witnessing a lot of talks (not only in this list)
where there are visionary-supporters of the "unspecified wonders"
of the "semantic-web-that-will-be" which are opposed by
pragmatic-supporters of Traditions that believes in
"unlimited-evolutionism" of full-text searches
and consider a "titanic&impossible" enterprise
the rdf-izing of the world :-).

But this conflict has any reason to exist?
We need to leave, on both sides,
all preconceived positions.

I believe the two worlds are being developed as separate reality
and this is a concrete problem that we have to resolve.

Today we have the opportunity to do so with HTML5.

2. LOWER THE BARRIER

It is clear that publishing simple web documents and applications
it is easier that structuring information in a semantic manner
but we must find ways to make this possible in a unified framework:

  documents + applications + semantics = HTML5

If we want the promises of semantic web to become a reality
we must lower the entry level for generic users.

HTML5 certainly must not solve problems that today we can't prefigure.
But there is clearly a problem that HTML5 faces today:
there is no widespread use of semantic tools
because the barrier to use them is too high for users.
This is the main reason behind the fact that semantic web
is being developed as a world in itself, mainly yet academic.

As well as the original HTML has enabled users to easily publish
hypertext documents, today HTML5 must allow users
to easily semantify their data, documents & applications.

At the moment, an user who wants to create/use
semantically structured informations finds browsers
that, natively, don't give him solutions to do that.

The user is forced to move in a "jungle" of tools
(without GUI or with poor usability), plugins and languages
that are not widespread standards.

Exactily the same situation faced by an user
who had tried to create hypertexts in 1990.

3. LINKS AND BEYOND

As well as the power of the traditional web is in "hypertextual-links"
among documents identified by URLs, the power of semantic web
is in "semantic-links" between documents/data/entities identified by URL/URI.

We must give users an easy way to create these semantic-links,
in a way that is as simple as creating classic hyperlinks.

Semantic-links could be collected by search engines (machines)
to enhance their functionalities, and could be used in other automatic
processing.
But, first of all, can represent a big value for the browser's user (human)
if we find in HTML5 a standard way to visualize/interact with these
semantic-links.

We could define a "semantic-link" as a connection to
"semantically structured informations" (embedded or in external resource),
that is presented to the user in a fashion similar
(but not the same) to classic hyperlinks. A semantic-link
could be considered as a sort of "semantic annotation"
enhancing the main content delivered to the user and
enabling him further interactions with "linked data".

We absolutely need for this a "common minimum standard"
although nothing will prevents to continue developing
additional or alternative ways of visualization/interaction
(via plugins, proprietary implementations in browsers,
new languages versions).

4. OVERVIEW OF SCENARIO'S USE CASES

With respect to use cases, are certainly to be considered
all the use cases developed by RDFa [3] but also
those developed by the "Semantic Web Activity" [4],
and other could be derived for each one of microformats [5]
or in the scenarios described by Adrian Holovaty in the article
"A fundamental way newspaper sites need to change" [6].

For example, would be interesting to have a standard for
a) structuring b) normally visualize in the page (via CSS)
b) have the possibility to interact/manipulate via the browser,
the data present in "Wikipedia's Infobox" [7].
Another example could be a standard for the visualization
of "access doors" to semantically structured informations "hidden" in the pages
and the "possible user's actions" (see "IE8 Activities" [8])?

Other interesting issues, in terms of user interface,
are raised by Alex Faaborg in the article
"User interface of microformat detection" [9],
and from the fact that we need something more user-friendly
and standardized of "bookmarklets" [10], from
the fact that structured information can improve
features in scenarios raised by projects like Ubiquity [11],
and, last but not least, some evaluation recently
exposed by Ian Hickson in WHATWG [12].

5. TWO REAL PROBLEMS

I think it's good, first of all, to abstract from single use cases
depicted above and find a solution to two fundamental problems
that lie at the root of the use cases, two problems that, today,
have no solution in the current version of HTML:

I) User agents must allow users to see that there are "semantic-links"
(connections to semantically structured informations)
in a HTML document/application. Consequently
user agents must allow users to "follow" the semantic-link,
(access/interact with the linked data, embedded or external)
and this involves primarily the ability to:
a) view the informations
b) select the informations
c) copy the informations in the clipboard
d) drag and drop the informations
e) send that informations
to another web application
(or to OS applications)
selected by the user.

II) User agents must allow users to "semantically annotate"
an existing HTML document (insert a semantic link and linked data)
and this involves primarily the ability to:
a) editing the document to insert semantically structured informations
(starting from the existing text or from information
already structured in the edited portion of the page)
b) send the result of the editing
to another web application
(or to OS applications)
selected by the user.

Solving the first problem we will spread to *all* users
the possibility to access the semantic web in normal browser
(target impossible to achieve simply through
microformats & plugins and without an effective
standard incorporation in HTML).

Solving the second problem we will spread to *all* users
(to all interested users) the possibility
to access the semantic potential at personal level
(for examle build an archive of personal semantic annotation)
and at social level (for example contribute to collective effort to
"semantify" originally unstructured web resources).

6. SEARCHING POSSIBLE SOLUTIONS

The first solution that we can think of
is a new attribute @semantic
(don't focus on his name) used like this

   <A href =".." semantic =".." class =".."
   <DIV semantic =".." class =".."

in @semantic we can have:

a) URL of a resource that semantically describes
the content (in RDF, RDFa, JSON, CSV) like this

   semantic="http://www.foo.com/desc.rdf"

b) direct semantically structured information, in @style manner,
probably something like this (thinking at RDFa)

   semantic="property: ..; about: ..;"

Furthermore, in the hypothesis of some sort of
"Cascading Semantics" (see for example cRDF [13])
we can also think  to create a new element SEMANTIC
like this

   <SEMANTIC Type=".."> ...</ SEMANTIC>

to embed semantically structured information
along the way in a CSS manner in several format.

Naturally we need further investigation on *all points*.

But, probably, we need some new properties/elements
because not all the exposed problems are simply solvable
through a generic extension mechanism [14]
that makes possible to insert RDFa in HTML.

A generic extension mechanism remains desirable
for other reasons (MathML, SVG, etc.), but we need
also a very different thing, set in the heart of HTML,
that makes it possible to bridge the gap between the two worlds
of semantic Web and traditional web...
to make them become one.

[1] http://www.w3.org/2001/sw/
[2] http://dev.w3.org/html5/spec/Overview.html#scope
[3] http://www.w3.org/TR/xhtml-rdfa-scenarios/
[4] http://www.w3.org/2001/sw/sweo/public/UseCases/
[5] http://microformats.org/wiki/Main_Page
[6] http://www.holovaty.com/blog/archive/2006/09/06/0307
[7] http://en.wikipedia.org/wiki/Help:Infobox
[8] http://blogs.msdn.com/ie/archive/2008/03/06/activities-and-webslices-in-internet-explorer-8.aspx
[9] http://blog.mozilla.com/faaborg/2007/02/04/microformats-part-4-the-user-interface-of-microformat-detection/
[10] http://en.wikipedia.org/wiki/Bookmarklet
[11] http://ubiquity.mozilla.com/
[12] http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-December/018023.html
[13] http://www.xanthir.com/rdfa-vs-crdf.php
[14] http://www.w3.org/html/wg/tracker/issues/41

-- 
Giovanni Gentili - giovanni.gentili@gmail.com

Received on Friday, 9 January 2009 17:00:31 UTC