- From: Sean B. Palmer <sean@mysterylights.com>
- Date: Tue, 24 Oct 2000 11:40:49 +0100
- To: "Sigfrid Lundberg, Lub NetLab" <siglun@gungner.lub.lu.se>
- Cc: <www-rdf-interest@w3.org>, "Dan Connolly" <connolly@w3.org>, "Simon St.Laurent" <simonstl@simonstl.com>
Hi Sigfrid, > First, metadata _is_ data_. And hence the phrase: data describing data. But then, do we need further data to describe that, and so on: data describing data describing data describing data describing data... > For an object Dan's list of publications in > RDF, it is more a description of Dan and his professional life than a > description of his home page. That stuff is interesting, but I'm referring to the profile of the W3C front page (www.w3.org), and automatic (XSLT, I think) generations thereof. > The term "metadata" has become broader than > it used to be. Dan's interesting example is automatic transformation of > sementics already present in his pages, not automatic generation of data. > There is a fundamental distinction between the two. Hmmmm.....could you explain what you mean by that? Semantic data is still data. > Automatic or manual generation of the of data/metadata, and the costs and > benefits of the two is beyond the scope of RDF as well as of DC. The > former is about methods for defining semantics of and encoding (meta)data, > and the latter is a particular set of semantics. Well, most people use XSLT for transformations: but I was wondering how it can hold up to that type of generation (XHTML to RDF). Using standard XSLT sheets you could automatically generate a site profile on the fly(?) > The automatic generation of a summary of a text is computer linguistics, > so is to extract and normalize keywords (using stemming and the like) and > to find the category of a text is automatic classification [1,2]. Neither > RDF, nor DC, will help you with this. Yes, I realise you cannot automatically generate a summary of text. That is down to the author. But you can classify the structure of a page, and generically determine what its purpose is. The problem is that HTML is really a notation for marking up documents rather than a pure XML means of conveying information. It is mainly display and presentation based; but it doesn't mean it *cannot* be semantically described... > You adhere to the description of complex beasts like entire web sites... > This is an interesting question, which requires a set of semantics of its > own. The Dublin Core Initiative has a work group exploring these > issues [3]. You're welcome to join that development. I may well do that. Thanks for the tip. > > What this means is that HTML will be semantic, rather than just lost chunks > > of data floating around. I would like to set up an entire site with full > > Semantic summaries/structure, but would appreciate if someone could help me > > on this point. > The major problem with automatic generation of such data when they are > encoded in RDF is to make it clear to the human end-users that there is a > qualitative difference between what has been generated by a machine and > what has been generated by human beings. One of the main problems with the Semantic Web. It still relies upon humans to start it off, and tweak it. No system will ever be fully automatic, which is why I am interested in applying the principles to human (HTML) applications. Trying, in effect, to Semanticize (probably not a word...) (X)HTML. We may have an XML Schema version of Modularization soon... It may be that the Semantic Web is useless if it cannot competently describe huge complex data systems, like the WWW. On the other hand, if it can, then it could revolutionise the Web as we see it today. Kindest Regards, Sean B. Palmer ---------------------------------------------------- WAP Tech Info - http://www.waptechinfo.com/ Mysterylights.com - http://www.mysterylights.com/ XHTML Modularization Resource - http://xhtml.waptechinfo.com/modularization/ ---------------------------------------------------- "The Internet; is that thing still around?" - Homer J. Simpson
Received on Tuesday, 24 October 2000 06:50:32 UTC