Re: RDF and Semantic (X)HTML

From: Sigfrid Lundberg, Lub NetLab <siglun@gungner.lub.lu.se>
Date: Tue, 24 Oct 2000 11:03:01 +0200 (MET DST)
To: "Sean B. Palmer" <sean@mysterylights.com>
cc: www-rdf-interest@w3.org, Dan Connolly <connolly@w3.org>, "Simon St.Laurent" <simonstl@simonstl.com>
Message-ID: <Pine.LNX.3.96.1001024102350.2041C-100000@gungner>
On Mon, 23 Oct 2000, Sean B. Palmer wrote:

> Hi Everyone,
> I notice that most of this RDF and (XML) Schema stuff (otherwise: semantic)
> is for 'describing' data, but many of the examples use "pure" data, and I
> was wondering what happens with more complex systems such as HTML websites?
> I notice that Dan Connolly has been playing around with some examples, and
> in particular RSS: but it looks like automatic generation of RDF from pages
> is closely linked to what that page *links* to, rather than what it's
> content is.

First, metadata _is_ data_. For an object Dan's list of publications in
RDF, it is more a description of Dan and his professional life than a
description of his home page. The term "metadata" has become broader than
it used to be. Dan's interesting example is automatic transformation of
sementics already present in his pages, not automatic generation of data.
There is a fundamental distinction between the two.

> I know we have Dublin Core, but can anyone suggest a way of deriving a
> "full" semantic description of a complex website with ease (i.e. automatic
> generation)?

Automatic or manual generation of the of data/metadata, and the costs and
benefits of the two is beyond the scope of RDF as well as of DC. The
former is about methods for defining semantics of and encoding (meta)data,
and the latter is a particular set of semantics.

The automatic generation of a summary of a text is computer linguistics,
so is to extract and normalize keywords (using stemming and the like)  and
to find the category of a text is automatic classification [1,2]. Neither
RDF, nor DC, will help you with this.

You adhere to the description of complex beasts like entire web sites... 
This is an interesting question, which requires a set of semantics of its
own. The Dublin Core Initiative has a work group exploring these
issues [3]. You're welcome to join that development.

> What this means is that HTML will be semantic, rather than just lost chunks
> of data floating around. I would like to set up an entire site with full
> Semantic summaries/structure, but would appreciate if someone could help me
> on this point.

The major problem with automatic generation of such data when they are
encoded in RDF is to make it clear to the human end-users that there is a
qualitative difference between what has been generated by a machine and
what has been generated by human beings.



[1] http://www.lub.lu.se/desire/radar/reports/D3.2.3/f_3.html
[2] http://www.lub.lu.se/tk/demos/korg9905-class.html
[3] http://purl.org/dc/groups/collections.htm
