Re: RDF and Semantic (X)HTML from Sean B. Palmer on 2000-10-24 (www-rdf-interest@w3.org from October 2000)

From: Sean B. Palmer <sean@mysterylights.com>
Date: Tue, 24 Oct 2000 11:40:49 +0100
To: "Sigfrid Lundberg, Lub NetLab" <siglun@gungner.lub.lu.se>
Cc: <www-rdf-interest@w3.org>, "Dan Connolly" <connolly@w3.org>, "Simon St.Laurent" <simonstl@simonstl.com>
Message-ID: <002701c03da7$fe0be8c0$34dc93c3@z5n9x1>

Hi Sigfrid,

> First, metadata _is_ data_.

And hence the phrase: data describing data. But then, do we need further
data to describe that, and so on: data describing data describing data
describing data describing data...

> For an object Dan's list of publications in
> RDF, it is more a description of Dan and his professional life than a
> description of his home page.

That stuff is interesting, but I'm referring to the profile of the W3C front
page (www.w3.org), and automatic (XSLT, I think) generations thereof.

> The term "metadata" has become broader than
> it used to be. Dan's interesting example is automatic transformation of
> sementics already present in his pages, not automatic generation of data.
> There is a fundamental distinction between the two.

Hmmmm.....could you explain what you mean by that? Semantic data is still
data.

> Automatic or manual generation of the of data/metadata, and the costs and
> benefits of the two is beyond the scope of RDF as well as of DC. The
> former is about methods for defining semantics of and encoding (meta)data,
> and the latter is a particular set of semantics.

Well, most people use XSLT for transformations: but I was wondering how it
can hold up to that type of generation (XHTML to RDF). Using standard XSLT
sheets you could automatically generate a site profile on the fly(?)

> The automatic generation of a summary of a text is computer linguistics,
> so is to extract and normalize keywords (using stemming and the like)  and
> to find the category of a text is automatic classification [1,2]. Neither
> RDF, nor DC, will help you with this.

Yes, I realise you cannot automatically generate a summary of text. That is
down to the author. But you can classify the structure of a page, and
generically determine what its purpose is.
The problem is that HTML is really a notation for marking up documents
rather than a pure XML means of conveying information. It is mainly display
and presentation based; but it doesn't mean it *cannot* be semantically
described...

> You adhere to the description of complex beasts like entire web sites...
> This is an interesting question, which requires a set of semantics of its
> own. The Dublin Core Initiative has a work group exploring these
> issues [3]. You're welcome to join that development.

I may well do that. Thanks for the tip.

> > What this means is that HTML will be semantic, rather than just lost
chunks
> > of data floating around. I would like to set up an entire site with full
> > Semantic summaries/structure, but would appreciate if someone could help
me
> > on this point.
> The major problem with automatic generation of such data when they are
> encoded in RDF is to make it clear to the human end-users that there is a
> qualitative difference between what has been generated by a machine and
> what has been generated by human beings.

One of the main problems with the Semantic Web. It still relies upon humans
to start it off, and tweak it. No system will ever be fully automatic, which
is why I am interested in applying the principles to human (HTML)
applications. Trying, in effect, to Semanticize (probably not a word...)
(X)HTML. We may have an XML Schema version of Modularization soon...
It may be that the Semantic Web is useless if it cannot competently describe
huge complex data systems, like the WWW. On the other hand, if it can, then
it could revolutionise the Web as we see it today.

Kindest Regards,
Sean B. Palmer
----------------------------------------------------
WAP Tech Info - http://www.waptechinfo.com/
Mysterylights.com - http://www.mysterylights.com/
XHTML Modularization Resource - http://xhtml.waptechinfo.com/modularization/
----------------------------------------------------
"The Internet; is that thing still around?" - Homer J. Simpson

Received on Tuesday, 24 October 2000 06:50:32 UTC