A quick intro to the RDF data model.

The following is an article I wrote for some friends, to help them
understand RDF after coming from a relational database point of view. 
It might be a little inaccurate in places, but the concepts are fairly
sound.

Feel free to print, post and/or modify: I'm putting this into the public
domain.  Attribute it to me (aredridel@nbtsc.org) if you like, but
that's not a requirement either.

Ari

------------------------------------------------------------------------

                 A quick intro to the RDF data model.

Traditionally, the box that everyone shoehorns their data into is the
relational database.

For an example, consider two tables: 

restaurants:
name (a string, and also our primary key)
approximate price (an integer)
cuisine (a string, or perhaps an enumerated type)

menu:
restaurant (referencing restaurants.name as a foreign key)
item (string; restaurant+item is the primary key)
cost (numeric)

It's a 1:N relation, where each restaurant has many items on the menu.

Some sample data (made humanly formatted):

restaurants:
"Joe's", "$7", "American"
"Wing's", "$15", "Chinese"

menu:
"Joe's", "Sloppy Joe", "$5"
"Joe's", "Soup", "$4"
"Joe's", "Bottomless Coffee", "$2"
"Wing's", "General's Chicken", "$11"
"Wing's", "Braised Shrimp", "$12"
"Wing's", "Tea", "$1"

There's a number of limits to this model.  

First, the schema is machine readable, but not machine reasonable.  As
far as any computer is concerned, the data is completely arbitrary.  SQL
and most relational systems don't let you easily constrain the data in
any column, and asking restaurant-specific questions is out of the scope
of the query language entirely.  The field names are simple strings.

Second, there are namespace problems: Consider trying to merge another
database with a similar field -- say the "price" field in this case is
an enumerated type like "Reasonable", "Expensive", "Downright cheap" in
one database, and an approximate per-meal figure in the other.  You'd
have a hard time merging the data.  Real world examples are far worse. 
In practice, one just doesn't merge data that doesn't merge well or
perfectly, depending.

Third, if one has many, many data items for each item, and most are not
known or available, you store "nil" for each undefined field.  Storage
requirements (and indexing requirements) are much higher for efficiency.

Fourth, SQL is not reflexive.  Storing data about data and querying it
in a relational manner is usually not possible, and certainly not
portable: the query "find all tables containing columns that contain
numerical data" is impossible in most relational systems, and ugly in
the few where it is possible.

RDF solves each of these to varying degrees.

On the surface, RDF is an abstraction of the traditional SQL relational
model into a more mathematically and logically pure rendition.  It is
also a set of standards for interoperating of databases and knowledge
systems with the web as we know it and will know it.

RDF organizes the data into "statements", each of which has a subject, a
predicate (or assertion) and an object.  Each of these is represented by
a URI (Uniform Resource /Indicator/), or a unique name formatted
similarly to a URL, and arranged with a heirarchical namespace, or a
literal string value.

The same data as shown above might look like this in RDF triples:

  http://restaurantlist.org/Joes
      http://www.chefmoz.org/syntax#costs
      "$7"
  http://restaurantlist.org/Joes
      http://www.chefmoz.org/syntax#servesCuisine
      http://www.chefmoz.org/terms/AmericanFood

and so on.  For the sake of simplicity, let's define some aliases:

  foodpred: means "http://www.chefmoz.org/syntax#"
  foodterms: means "http://www.chefmoz.org/terms/"
  joesmenu: means "http://www.joes.com/menu/items/"

All that's fairly normal XML namespace stuff, which RDF uses as well.
It's purely for notational convenience, even in XML.

So now, the full data:

  http://restaurantlist.org/Joes foodpred:costsAbout "$7".
  http://restaurantlist.org/Joes foodpred:servesCuisine 
       foodterms:AmericanFood.
  http://restaurantlist.org/Joes foodpred:hasMenuItem
       joesmenu:SloppyJoes.
  http://restaurantlist.org/Joes foodpred:hasMenuItem
       joesmenu:BottomlessCoffee.
  http://restaurantlist.org/Joes foodpred:hasMenuItem joesmenu:Soup.

  joesmenu:Soup foodpred:costs "$4".
  joesmenu:SloppyJoes foodpred:costs "$5".
  joesmenu:BottomlessCoffee foodpred:costs "$2".
  
  http://restaurantlist.org/Wings foodpred:costsAbout "$12".
  http://restaurantlist.org/Wings foodpred:servesCuisine
       foodterms:Chinese.
  http://restaurantlist.org/Wings foodpred:hasMenuItem
       http://wings.cn/items#GeneralsChicken.
  http://restaurantlist.org/Wings foodpred:hasMenuItem
       http://wings.cn/items#BraisedShrimp.
  http://restaurantlist.org/Wings foodpred:hasMenuItem
       http://wings.cn/items#Tea.

  http://wings.cn/items#Tea foodpred:costs "$1".
  http://wings.cn/items#GeneralsChicken foodpred:costs "$11".
  http://wings.cn/items#BraisedShrimp foodpred:costs "$12".

  http://restaurantlist.org/Joes _:isCalled "Joe's Diner"
  http://restaurantlist.org/Wings _:isCalled "Wing's Chinese Food"

That's a lot to digest, but think of it in English: it's a paragraph
that says:

  "Joes costs about $7 per person; they serve Sloppy Joes, Bottomless
   Coffee and Soup. The soup costs $4. The sloppy joes cost $5. The
   coffee costs $2"

and so on for the chinese food too.

The advantage here is that it's machine parseable statements of fact (or
at least assertions of fact)

One big clincher here is in the schema language: The schema is RDF as
well.  It's a series of statements about predicates (the middle column)
and the types of the subjects and objects (first and third columns).

  http://restaurantlist.org/Joes rdf:type 
       http://xmlns.com/wordnet/1.6/Restaurant
  http://restaurantlist.org/Wings rdf:type
       http://xmlns.com/wordnet/1.6/Restaurant
  http://wings.cn/items#BraisedShrimp rdf:type
       http://xmlns.com/wordnet/1.6/EthnicFood
  joesmenu:BottomlessCoffee rdf:type
       http://wordnet/1.6/terms/HotBeverage

And now, any RDF consuming app may be able to make some basic assertions
about the restaurants and what they sell, if it's been taught the
wordnet vocabulary.  The wordnet vocabulary has been exported as RDF,
with assertions like:

  Mutt isKindOf Dog
  Dog isSubclassOf Canine
  Canine isSubclassOf Mammal
  Mammal isSubclassOf Vertebrate
  Vertebrate isSubclassOf Animal

and so on. From the simple assertion that Wing's is a restaurant, and
that Braised Shrimp is a dish, it can now be found for anyone looking
for an ethnic cuisine -- or anyone looking for superclasses of ethnic
cuisine.

Now, in the example above, I invented a URI for both Wing's and Joe's. 
That's not good form, but it's more obvious to explain.  All that's
needed is a unique ID for that thing -- like the primary key in a
relational database.  A better thing might be:

   _:123141  instead of http://restaurantlist.org/Joes

and

   _:654123 instead of http://restaurantlist.org/Wings

Which seems to remove a bit of useful info, but it also separates
restaurantlist.org from the equation, and who says they're the official
URI for Wing's anyway?  There may be many or none: http://wings.cn/,
though being registered by someone else makes your database dependent on
the fact that they don't go global and become http://wings.com/; perhaps
this is your personal restaurant list, and being dependant on
restaurantlist.org to be "the" place to list restaurants is a little too
shaky: in fifty years, will they still be the place on the net to go
find food?  Or maybe this is Jane's Mom-and-pop diner, and they're not
net-saavy yet.  This isn't a problem: all one has to do is add another
statement:

   _:654123 hasHomepage http://www.wings.cn/
   _:654123 isOwnedBy urn:usssn:123-56-1234
   _:123141 isOwnedBy urn:usEIN:13976-45-12
   urn:usEIN:13976-45-12 hasHomepage http://restaurantsinc.biz

Now, one can query for "Wings" by homepage:  you can ask "find me the
menu for the restaurant who's homepage is http://www.wings.cn".  That's
not the same as asking "find me the menu for http://wings.cn", since
these are URIs, not URLs -- they're just names, not links.  A more
obvious case is Jane's diner: "find me the menu of the restaurant called
"Eat at Jane's" that is located in Tallahassee, FL" -- a good thing to
be able to ask, since Jane's has no URL, and so making an intelligent
and unique URI that's globally meaningful is not going to happen.

These is called "blank nodes" -- blank but unique spots in the
information space.  You can talk about them, but there's no single
handle to get ahold of them by -- just like in the real world.  One
person knows it as "the restaurant on the corner of main and third",
another knows it as "the restaurant jane runs", and another knows it by
the name they filed on their liquor license application.

There's a query language for all this: (well, several -- one is called
RDQL and another is Squish, shown here)

SELECT ?name, ?price
        FROM restaurants.nt
        WHERE
        (?x rdf::type wn::Restaurant)
        (?x fp::costsAbout ?price)
        (?x _::isCalled ?name)
        (?x fp::hasMenuItem ?c)
(?c rdf::type wn:EthnicCuisine)
        USING
        rdf for http://www.w3.org/1999/02/22-rdf-syntax-ns#
        fp for http://www.chefmoz.org/syntax#
        wn for http://xmlns.com/wordnet/1.6/
        _ for <>

The results?

    "Wing's Chinese Food" "$12"

I'm in the mood for chinese, so let's go eat at Wing's.

<end>

Received on Thursday, 11 September 2003 01:54:17 UTC