- From: Tim Chartrand <tim@cs.byu.edu>
- Date: Wed, 30 Jan 2002 15:38:45 -0700
- To: <www-rdf-interest@w3.org>
Hi, I'm new to the RDF arena but would like to find a research topic in this area for my masters thesis. My advisor and I have an idea, and we want to run it by some other people who know what's going on in this area. Here's the idea: Background: Our research group (BYU's Data Extraction Group) has done a lot of work on the automatic extraction of data from semistructured or unstructured datasources (mainly web pages). The way we do this is to first define a domain dependent extraction ontology that describes the target schema of the data as well as some keword and regular expression matching rules. The we can take a web page with data in that domain and extract it automatically into a database. Where RDF comes into it: I'm thinking we could make a tool that takes an RDF Schema and semi-automatically turns it into a data extraction ontology. Then it would use that ontology (also an RDF Schema) and use it to automatically extract data from web pages. Finally, it would structure the data as RDF that could be inserted into the header of the web page or kept in a repository somewhere. The idea is that the SW may be prevalent enough sometime in the future that lots of data will be machine readable by design (i.e. not just thrown out on the web in HTML for human consumption), but since that is clearly not the case, we'd like to help it along a little by helping to automate the conversion from human readable to machine readable. Please comment on this idea. Specifically: - Is anyone else doing anything similar? - Would this be a useful tool/technology? - Do you like it? - Our main concern is whether or not RDF is really meant to be used to describe data in general. I know that it has a fairly rich way of creating conceptual models (Schemas), but most of the examples that are prevalent on the web give me the impression that RDF is meant to be used more for meta-data rather than the data itself. - Any other thought you have about this idea
Received on Wednesday, 30 January 2002 17:38:15 UTC