- From: Simon Spero <ses@unc.edu>
- Date: Fri, 26 Jun 2009 13:42:26 -0400
- To: Dan Brickley <danbri@danbri.org>
- Cc: SKOS <public-esw-thes@w3.org>
- Message-ID: <1af06bde0906261042t543c47d2kb5564913a162173f@mail.gmail.com>
On Fri, Jun 26, 2009 at 9:56 AM, Dan Brickley <danbri@danbri.org> wrote: > > Hi folks > > While SKOS is an application of RDF, and makes use of various RDF features, > I'd like to ask if any of your have built (or know about) plain SQL > implementations of SKOS. This is to help advise on some work that is based > around elaboration of a simpler SQL taxonomy structure. I would like some > examples to show that are SKOS-friendly, SKOS-centric or SKOS-based ... but > which doesn't make the complete jump to full unconstrained tables of RDF > triples. Anything around? Dan - SQL isn't really ideal, but this approach kinda sorta works: it is hard to keep things clustered, and there can be a lot of database overhead for short rows. --- Four tables - Concept, Label, Relationship, and Note *Concept* For plain SKOS, Concept just carries a unique concept ID; approaches that require subclasses of Concept can extend this table. CREATE TABLE Concept ( id int PRIMARY KEY ); *Label* Label is used to store the labels associated with a concept. Every label is marked with a flag to indicate the language it is encoded in, and the kind of label it is - for example, preferred, alternate, and historical). There are two uniqueness constraints that we need to enforce. One is the restriction that a concept can only have one preferred label per concept per language; the other must ensure that a given string is not used as label for more than one concept, *unless* the label is no longer current. In postgresql, these constraints can be implemented using partial indexes. Because of the uniqueness contraints, Label should not be used for classification codes. Keeping all the labels in a single table makes it easier to search for a label when the type is not known. The best order to arrange this table depends on how the database is likely to be used; if a typical use involves displaying an entry (thus retrieving all the labels for a concept, it's better to order the table by concept; if a more typical use is displaying a tree of preferred labels, then it is better to order the table by kind. In both situations, labels should ideally be output in as close to the order in which they will be accessed as possible (breadth-first is probably better than depth first). CREATE TABLE Label ( concept_id int REFERENCES Concept(id) ON DELETE CASCADE NOT NULL, kind char(1) NOT NULL, -- Preferred, Alternate, Historical lang text NOT NULL, label text NOT NULL ); CREATE INDEX Label_label_idx ON Label (label); CREATE INDEX Label_concept_id_label_type_idx ON Label (concept_id,kind); CREATE UNIQUE INDEX Label_preferred_label_idx ON Label (concept_id,lang) WHERE kind = 'P'; CREATE UNIQUE INDEX Label_not_historic_idx ON Label (label,lang) WHERE kind <> 'H'; *Relationship* Relationship contains three elements; a source, a destination, and a flag indicating the kind of relationship being marked This reminds me of something - can't thing what. An entry should be created in each direction, with the table clustered by source concept id; this groups together all the relationships for a given concept. CREATE TABLE SemanticRelationship ( src int REFERENCES Concept(id) ON DELETE CASCADE NOT NULL, dst int REFERENCES Concept(id) ON DELETE CASCADE NOT NULL, kind char(1) NOT NULL, -- Broader,Narrower,Related, UNIQUE (kind,src,dst) ); CREATE INDEX SemanticRelationship_src_idx ON SemanticRelationship (src); *Note* Note is used to carry all of the plain text notes associated with a concept. All kinds of notes are stored in the same table, allowing them to be searched using a single index. CREATE TABLE Note ( id int PRIMARY KEY, concept_id int REFERENCES Concept(id) ON DELETE CASCADE NOT NULL, kind char(1) NOT NULL, -- definition, scope note, etc. lang TEXT NOT NULL, note TEXT NOT NULL ); CREATE INDEX Note_concept_id ON Note (concept_id); CREATE INDEX Note_note_idx ON Note (note);
Received on Friday, 26 June 2009 17:43:07 UTC