- From: Seth Russell <seth@robustai.net>
- Date: Sun, 2 Sep 2001 05:47:42 -0700
- To: "Charles McCathieNevile" <charles@w3.org>
- Cc: "William Loughborough" <love26@gorge.net>, <www-rdf-interest@w3.org>, "Sean B. Palmer" <sean@mysterylights.com>
From: "Charles McCathieNevile" <charles@w3.org> > The scraper looks for a profile to know how to deal with a page. If it can't > find one in the page, then it could look for metadata taht has been provided > (we don't have a standard mechanism for that yet, and if we did it would > likely be by dereferencing the supplied profile URI). After that it is stuck. Not at all. Don't forget that the scraper is an independant agent, it can do whatever it pleases. Not only that, but it doesn't even need to follow standards ... it can use whatever stratagies for finding metadata that it wants. But the heirchy I suggested below seems to be quite natural: > 1) is profile provided by author on page? ... use that one > 2) is profile provided by surfer? (via cookie) ... use that one > 3) is profile provided by groupOf(surfer)? ... use that one > 3) is profile specified for profileFn(document type)? ... use that one > 4) otherwise default to the most general profile (strategy) > What I am suggesting is that as we are starting to develop multile tools for > scraping differnt information from one page, we should think about how to use > the profile so it works to support this. I am not convinced that a > space-seperated list of URIs is a good solution, although it has the benefiot > of not requiring out of line information. I think the problem here is what authors are gonna get off their duff about and actually put on their pages. It's the old degenerative cycle of what's gonna come first, a chicken or a egg. In this case the scrappers are the egg and the pages are the chickens. ... may the best scrapers win ... and propogate the sem web wikies .... Seth Russell >
Received on Sunday, 2 September 2001 08:48:40 UTC