- From: Aredridel <aredridel@nbtsc.org>
- Date: Wed, 10 Sep 2003 23:53:43 -0600
- To: www-rdf-interest@w3.org
- Message-Id: <1063259623.14400.12.camel@mizar.nbtsc.org>
The following is an article I wrote for some friends, to help them understand RDF after coming from a relational database point of view. It might be a little inaccurate in places, but the concepts are fairly sound. Feel free to print, post and/or modify: I'm putting this into the public domain. Attribute it to me (aredridel@nbtsc.org) if you like, but that's not a requirement either. Ari ------------------------------------------------------------------------ A quick intro to the RDF data model. Traditionally, the box that everyone shoehorns their data into is the relational database. For an example, consider two tables: restaurants: name (a string, and also our primary key) approximate price (an integer) cuisine (a string, or perhaps an enumerated type) menu: restaurant (referencing restaurants.name as a foreign key) item (string; restaurant+item is the primary key) cost (numeric) It's a 1:N relation, where each restaurant has many items on the menu. Some sample data (made humanly formatted): restaurants: "Joe's", "$7", "American" "Wing's", "$15", "Chinese" menu: "Joe's", "Sloppy Joe", "$5" "Joe's", "Soup", "$4" "Joe's", "Bottomless Coffee", "$2" "Wing's", "General's Chicken", "$11" "Wing's", "Braised Shrimp", "$12" "Wing's", "Tea", "$1" There's a number of limits to this model. First, the schema is machine readable, but not machine reasonable. As far as any computer is concerned, the data is completely arbitrary. SQL and most relational systems don't let you easily constrain the data in any column, and asking restaurant-specific questions is out of the scope of the query language entirely. The field names are simple strings. Second, there are namespace problems: Consider trying to merge another database with a similar field -- say the "price" field in this case is an enumerated type like "Reasonable", "Expensive", "Downright cheap" in one database, and an approximate per-meal figure in the other. You'd have a hard time merging the data. Real world examples are far worse. In practice, one just doesn't merge data that doesn't merge well or perfectly, depending. Third, if one has many, many data items for each item, and most are not known or available, you store "nil" for each undefined field. Storage requirements (and indexing requirements) are much higher for efficiency. Fourth, SQL is not reflexive. Storing data about data and querying it in a relational manner is usually not possible, and certainly not portable: the query "find all tables containing columns that contain numerical data" is impossible in most relational systems, and ugly in the few where it is possible. RDF solves each of these to varying degrees. On the surface, RDF is an abstraction of the traditional SQL relational model into a more mathematically and logically pure rendition. It is also a set of standards for interoperating of databases and knowledge systems with the web as we know it and will know it. RDF organizes the data into "statements", each of which has a subject, a predicate (or assertion) and an object. Each of these is represented by a URI (Uniform Resource /Indicator/), or a unique name formatted similarly to a URL, and arranged with a heirarchical namespace, or a literal string value. The same data as shown above might look like this in RDF triples: http://restaurantlist.org/Joes http://www.chefmoz.org/syntax#costs "$7" http://restaurantlist.org/Joes http://www.chefmoz.org/syntax#servesCuisine http://www.chefmoz.org/terms/AmericanFood and so on. For the sake of simplicity, let's define some aliases: foodpred: means "http://www.chefmoz.org/syntax#" foodterms: means "http://www.chefmoz.org/terms/" joesmenu: means "http://www.joes.com/menu/items/" All that's fairly normal XML namespace stuff, which RDF uses as well. It's purely for notational convenience, even in XML. So now, the full data: http://restaurantlist.org/Joes foodpred:costsAbout "$7". http://restaurantlist.org/Joes foodpred:servesCuisine foodterms:AmericanFood. http://restaurantlist.org/Joes foodpred:hasMenuItem joesmenu:SloppyJoes. http://restaurantlist.org/Joes foodpred:hasMenuItem joesmenu:BottomlessCoffee. http://restaurantlist.org/Joes foodpred:hasMenuItem joesmenu:Soup. joesmenu:Soup foodpred:costs "$4". joesmenu:SloppyJoes foodpred:costs "$5". joesmenu:BottomlessCoffee foodpred:costs "$2". http://restaurantlist.org/Wings foodpred:costsAbout "$12". http://restaurantlist.org/Wings foodpred:servesCuisine foodterms:Chinese. http://restaurantlist.org/Wings foodpred:hasMenuItem http://wings.cn/items#GeneralsChicken. http://restaurantlist.org/Wings foodpred:hasMenuItem http://wings.cn/items#BraisedShrimp. http://restaurantlist.org/Wings foodpred:hasMenuItem http://wings.cn/items#Tea. http://wings.cn/items#Tea foodpred:costs "$1". http://wings.cn/items#GeneralsChicken foodpred:costs "$11". http://wings.cn/items#BraisedShrimp foodpred:costs "$12". http://restaurantlist.org/Joes _:isCalled "Joe's Diner" http://restaurantlist.org/Wings _:isCalled "Wing's Chinese Food" That's a lot to digest, but think of it in English: it's a paragraph that says: "Joes costs about $7 per person; they serve Sloppy Joes, Bottomless Coffee and Soup. The soup costs $4. The sloppy joes cost $5. The coffee costs $2" and so on for the chinese food too. The advantage here is that it's machine parseable statements of fact (or at least assertions of fact) One big clincher here is in the schema language: The schema is RDF as well. It's a series of statements about predicates (the middle column) and the types of the subjects and objects (first and third columns). http://restaurantlist.org/Joes rdf:type http://xmlns.com/wordnet/1.6/Restaurant http://restaurantlist.org/Wings rdf:type http://xmlns.com/wordnet/1.6/Restaurant http://wings.cn/items#BraisedShrimp rdf:type http://xmlns.com/wordnet/1.6/EthnicFood joesmenu:BottomlessCoffee rdf:type http://wordnet/1.6/terms/HotBeverage And now, any RDF consuming app may be able to make some basic assertions about the restaurants and what they sell, if it's been taught the wordnet vocabulary. The wordnet vocabulary has been exported as RDF, with assertions like: Mutt isKindOf Dog Dog isSubclassOf Canine Canine isSubclassOf Mammal Mammal isSubclassOf Vertebrate Vertebrate isSubclassOf Animal and so on. From the simple assertion that Wing's is a restaurant, and that Braised Shrimp is a dish, it can now be found for anyone looking for an ethnic cuisine -- or anyone looking for superclasses of ethnic cuisine. Now, in the example above, I invented a URI for both Wing's and Joe's. That's not good form, but it's more obvious to explain. All that's needed is a unique ID for that thing -- like the primary key in a relational database. A better thing might be: _:123141 instead of http://restaurantlist.org/Joes and _:654123 instead of http://restaurantlist.org/Wings Which seems to remove a bit of useful info, but it also separates restaurantlist.org from the equation, and who says they're the official URI for Wing's anyway? There may be many or none: http://wings.cn/, though being registered by someone else makes your database dependent on the fact that they don't go global and become http://wings.com/; perhaps this is your personal restaurant list, and being dependant on restaurantlist.org to be "the" place to list restaurants is a little too shaky: in fifty years, will they still be the place on the net to go find food? Or maybe this is Jane's Mom-and-pop diner, and they're not net-saavy yet. This isn't a problem: all one has to do is add another statement: _:654123 hasHomepage http://www.wings.cn/ _:654123 isOwnedBy urn:usssn:123-56-1234 _:123141 isOwnedBy urn:usEIN:13976-45-12 urn:usEIN:13976-45-12 hasHomepage http://restaurantsinc.biz Now, one can query for "Wings" by homepage: you can ask "find me the menu for the restaurant who's homepage is http://www.wings.cn". That's not the same as asking "find me the menu for http://wings.cn", since these are URIs, not URLs -- they're just names, not links. A more obvious case is Jane's diner: "find me the menu of the restaurant called "Eat at Jane's" that is located in Tallahassee, FL" -- a good thing to be able to ask, since Jane's has no URL, and so making an intelligent and unique URI that's globally meaningful is not going to happen. These is called "blank nodes" -- blank but unique spots in the information space. You can talk about them, but there's no single handle to get ahold of them by -- just like in the real world. One person knows it as "the restaurant on the corner of main and third", another knows it as "the restaurant jane runs", and another knows it by the name they filed on their liquor license application. There's a query language for all this: (well, several -- one is called RDQL and another is Squish, shown here) SELECT ?name, ?price FROM restaurants.nt WHERE (?x rdf::type wn::Restaurant) (?x fp::costsAbout ?price) (?x _::isCalled ?name) (?x fp::hasMenuItem ?c) (?c rdf::type wn:EthnicCuisine) USING rdf for http://www.w3.org/1999/02/22-rdf-syntax-ns# fp for http://www.chefmoz.org/syntax# wn for http://xmlns.com/wordnet/1.6/ _ for <> The results? "Wing's Chinese Food" "$12" I'm in the mood for chinese, so let's go eat at Wing's. <end>
Received on Thursday, 11 September 2003 01:54:17 UTC