- From: Danny Ayers <danny666@virgilio.it>
- Date: Tue, 23 Apr 2002 11:33:57 +0200
- To: "Joshua Allen" <joshuaa@microsoft.com>, <msabin@interx.com>, <www-rdf-interest@w3.org>
While I agree a with a lot that has been said on this thread, there are a couple of points I must take issue with. The main issue for me is the underlying assumption that a semantic web needs the explicit assertion of metadata by content providers for it to work. A related issue, IMHO, is the idea of 'polluted' metadata. The expectation that the great proportion of content providers will provide metadata is unrealistic, particularly when you consider the material already one the web. When Dreamweaver & FrontPage make it obligatory to insert RDF before ftp-ing, only then can we have a semantic web? Talking of 'polluted' metadata is about as useful as talking of the English language as polluted because of dialectic variation. If the English World is only those people that speak pure "Queen's English" then we're looking at a quaint handful of people around London. The Semantic Web, in the sense of one in which logical inference can be made with the material on the web certainly requires metadata that is at least consistent locally. Granted, people inserting metadata in their output will be an aid to this, especially if they stick precisely to agreed schema. This should certainly be encouraged, especially for automatically-produced content where such conformance is easier (per page) to implement. The 'what is an identifier' etc discussion is rather angels-on-pinheads. What we have in the wild is a great mass of information, full of semantic hooks, identifiers in the form of URLs. These may not be URIs in a form we might prefer, but Pandora's box has already been opened. A proportion of these identifiers will have associated with them explicit metadata, but even this is likely to be 'polluted'. The world is largely analog, but digital computers are still useful with real-world data because we can extract discrete approximations. The web is a semantic continuum, so why shouldn't that be digitised? I would suggest that to provide the metadata to feed a Semantic Web, we need to look more to other techniques in the (somewhat taboo) machine learning domain. For humans to interface with the SW, then decent NLU is desirable, this same technology can be used to generate metadata - yes, through scraping and statistical/neural text analysis. There are a lot more sources of data that could go into the mix as well, like browser behaviour analysis. The existence of URLs within the dataset gives this an awful lot more potential than single-document analysis. What I'm talking about is systems like Google, but instead of producing material for immediate human consumption, producing metadata for machines. The metadata generated by one such system may be completely at odds with that generated by another, but this can be sorted out at the logical layer, using the same methods that would for example lead us to trust the opinion of expert A over that of expert B (I've had my wrists slapped too many times to mention putting fuzzy/statistical/neural techniques on this layer). So what I'm basically saying is that the web is and will continue to be 'polluted', so any systems that don't take this into account risk excluding a large proportion of available information, and that the extraction of implicit metadata can significantly help circumvent the lack of explicit metadata. Oh yes, and that at the end of the day, when definitions of identifiers have been agreed on universally within the RDF community, the world outside will by and large ignore those definitions. Cheers, Danny. --- Danny Ayers <stuff> http://www.isacat.net </stuff> >-----Original Message----- >From: www-rdf-interest-request@w3.org >[mailto:www-rdf-interest-request@w3.org]On Behalf Of Joshua Allen >Sent: 23 April 2002 06:38 >To: msabin@interx.com; www-rdf-interest@w3.org >Subject: RE: Documents, Cars, Hills, and Valleys > > >> There already _are_ thousands of such assertions. Either people are > >> Well, this is the status quo, and the prospects of changing it strike >> me as fairly slim. So if you're right that this renders metadata >> useless, we may as well pack up and go home. > >Now you see my point. The status quo is that there are a few people >publishing assertions that very few other people ever use, and are >impossible to aggregate globally in any meaningful way. > >In other words, the status quo is that we do NOT have a semantic web; we >have a bunch of people rolling their own hypercard systems and claiming >that they are building a world-wide-web. > >In 1989, you could have argued that "there are thousands of hypertext >pages that use hyperlinks which are only meaningful within context of >their particular system -- this is the status quo, and dreaming about >universal identifiers so that all hyperlink systems interoperate is a >pipe-dream, bub." >But this was as wrong about the WWW then as it is about the "semantic >web" now. A true semantic "web" uses universal identifiers, period. >Saying that there are lots of fragmented systems that use identifiers >which are not truly universal is not the same as saying that a system >which *does* use universal identifiers is not possible or desirable. > >Hypercard didn't stop the WWW from being deployed -- in fact the WWW >made closed-world hypertext systems seem rather insignificant in short >order. Maybe closed-world semantic systems are interesting to you, but >I believe that a semantic web has potential to make the "status quo" >insignificant. > >> largely untroubled by ambguity, or, in practice, ambiguity isn't the >> disasterous problem you're making it out to be. > >In practice, there is no semantic web yet. And in practice, people >using identifiers in gratuitously ambiguous ways will never be a part of >a global semantic web. We all agree that these people will probably be >able to do interesting things with their polluted metadata, and perhaps >even build bridges to the global semantic web through lots of manual >conversion. But that's about as relevant to "the semantic web" as >hypercard was to the WWW. >
Received on Tuesday, 23 April 2002 05:39:21 UTC