Automatically inferring schemata

Hi all,

As part of the teleconference on Thursday, we talked about automatically
generating RDF statements from various sources (RDBMS tables and Java Beans,
to name two).

Part of the question/issue for us in DSpace is how much effort to expend to
characterize the relationships. A low-level of effort means that it is much
more low-cost to add a triple store.

As Ralph pointed out, you can always capture the lowest level of message
reception: a Bean with classname org.dspace.core.Publication has property
title with value "My Publication". The same Bean also has the field
submittingUser with id 12. All of these things are true, follow as
corollaries from their environment, and hence can be generated
automatically.

At the other end of the spectrum, you could explicitly write logic which
would generate the RDF statements for everything in the system you cared
about, but this would not be automatic: when I see a Publication Bean, I
will generate the following statements: A Publication has id 2; that
publication has the title "My Publication"; that publication was submitted
by the user with id 12; and so forth.

We're hoping to live somewhere in the middle, where most relationships can
be inferred from the program design (eg, the Publication with a title
property), but those which cannot be inferred are explicitly characterized.
As Brian McBride commented, a natural way to do this would be a lookup-table
approach. So we might have Publication has title "My Publication"
automatically inferred, but the relationship between Publication and User
would be annotated as "submittedBy".

I've been using Bean examples above; here are some example relationships
from a SQL table structure:

Completely automatic: A row of the USER table has unique Id 12 (inferred by
virtue of being a PRIMARY KEY); the row with id 12 has FIRSTNAME column with
value "Jason"; the row of the COLLECTION2PUBLICATION  table with id 54
references (via a FOREIGN KEY) a row in the COLLECTION table with id 3 and
references a row in the PUBLICATION table with id 200.

Custom: For all rows in the COLLECTION2PUBLICATION table, generate the
following relationship: PUBLICATION is-part-of COLLECTION

Mostly-inferred (table-driven): For any mapping table which references two
foreign keys, assume a relationship between the two things referenced. For
the COLLECTION2PUBLICATION table, the relationship is "is-part-of" and goes
from PUBLICATION to COLLECTION. (You could also have a symmetrical
relationship from COLLECTION to PUBLICATION with the relationship
"contains").

Hoping this is clear,

Peter

Received on Thursday, 31 May 2001 21:32:20 UTC