RDF Database Tables from McBride, Brian on 2000-05-11 (www-rdf-interest@w3.org from May 2000)

From: McBride, Brian <bwm@hplb.hpl.hp.com>
Date: Thu, 11 May 2000 20:14:04 +0100
To: "RDF Interest (E-mail)" <www-rdf-interest@w3.org>
Message-ID: <5E13A1874524D411A876006008CD059F239231@0-mail-1.hpl.hp.com>
I got a few requests to see the tables structures
I'm using in my proto database, so here they are.  Comments
and suggestions are welcome.

Caveats:

	1) I'm no database expert; wrote my first line of SQL
         when I started this.  

	2) I haven't finished the implementation yet; in 
	   particular haven't done delete, container support
	   or reification.  So there is more to learn.

		
		sql = "CREATE TABLE RDFRESOURCE"
			+ "("
			+   "Id INTEGER not null primary key,"
			+   "NS INTEGER not null,"
			+   "RoName varchar(255)"
			+ ")";

The resource table holds all resources.  Id is an internal
identifier field.  NS is a pointer to an entry in the namespace
table giving the namespace for this resource.  RoName should be
called 'localname' and is the local name component of the Qname.


		sql = "CREATE TABLE RDFNameSpace"
			+ "("
			+   "Id INTEGER not null primary key,"
			+   "NsName varchar(255)"
			+ ")";

The namespace table.
                               
		sql = "CREATE TABLE RDFLiteral"
			+ "("
			+   "Id INTEGER not null primary key,"
			+   "VAL varchar (4000)"
			+ ")";

Table of literals.  4000 character limit is enough for current purposes.
                
		
		sql = "CREATE TABLE RDFStatement"
			+ "("
			+   "Id INTEGER not null primary key,"
			+   "Subject INTEGER not null,"
			+   "Predicate INTEGER not null,"
			+   "ObjResource INTEGER not null,"
                  +   "ObjLiteral INTEGER not null,"
			+   "Res CHAR(1) not null"
			+ ")";

Statement table.  Originally had a single object field which
would have the ID of either an object or a literal.  A
complicated JOIN expression is used to enumerate a list of
statements and was behaving unexpectedly.  My inexperience
with SQL probably.  This works and feels more 'correct'.  Res is
a flag to say whether the object is a resource or literal.
                
                sql = "CREATE TABLE RDFModel"
                        + "("
                        +   "ModelId INTEGER not null,"
                        +   "Statement INTEGER not null,"
                        +   "Asserted CHAR(1) not null,"
                        +   "Reified CHAR(1) not null,"
                        +   "primary key(ModelId, Statement)"
                        + ")";

The database can hold multiple models.  This table keeps a list of 
the statements in each model.  Originally this table was combined with
the statement table but that didn't work when it came to
implementing set operations. 

Asserted is a flag which says this statement is asserted in this model.
Reified is a flag which says this statement is reified in this model.
The latter is a hook for future implementation.  Reification isn't 
implemented so this approach is untested.

Each model is a resource and has an entry in the resource table.
The ModelId field is the Id of that resource.  Thus
statements can be made about the model.  There is a class for models
and it is possible to list the schemas that need to be loaded when
validating a model.
                
                sql = "CREATE OR REPLACE VIEW RootModel"
                     +  " AS SELECT UNIQUE Id, Subject, Predicate,
ObjResource, ObjLiteral, Res, Asserted, Reified"
                     +    " FROM RDFModel, RDFStatement"
                     +    " WHERE RDFModel.Statement = RDFStatement.Id";

This creates a view of an artificial model which contains all the statements
in the database, whatever 'actual' model they are in.

Views are heavily used.  Every model is a view which is a query onto either
another model view or the RootModel view.  So every time any operation in
the Stanford API which creates a model is called, e.g. any query, a new view
is created in the database.  This must lead to some bizarre queries being
fed to the database query engine, and I'm relying on its query optimiser to
sort those out.  There is also a problem with stale views being left around
in the database should an application crash.

		sql = "CREATE TABLE RDFKEYS"
			+ "("
			+   "TableName char(10) not null primary key,"
			+   "Key INTEGER not null"
			+ ")";
		sql = "INSERT INTO RDFKEYS (TableName, Key)
VALUES('Resource', 0)";
		sql = "INSERT INTO RDFKEYS (TableName, Key)
VALUES('NameSpace', 0)";

A key generation table.  Probably could use a sequencer, but they seem to be
a bit database specific and I started out at least with the intent not to
be. Only really need one generator though.

There was a question about schemas.  Right now, schemas are held as models
and can be fed into the schema validator.  No attempt has been made to use a
schema to define a more specific database structure.

Brian McBride
HPLabs
Received on Thursday, 11 May 2000 15:14:17 UTC